Izvestiya of Saratov University.

Mathematics. Mechanics. Informatics

ISSN 1816-9791 (Print)
ISSN 2541-9005 (Online)

For citation:

Tverdokhlebov V. A., Kariakin D. A. Classification and Recognition of Structures of Genetic Sequences. Izvestiya of Saratov University. Mathematics. Mechanics. Informatics, 2019, vol. 19, iss. 3, pp. 338-350. DOI: 10.18500/1816-9791-2019-19-3-338-350, EDN: VQJQOM

This is an open access article distributed under the terms of Creative Commons Attribution 4.0 International License (CC-BY 4.0).
Published online: 
Full text:
(downloads: 233)
Article type: 

Classification and Recognition of Structures of Genetic Sequences

Tverdokhlebov Vladimir Aleksandrovich, Saratov State University
Kariakin Denis A., Saratov State University

For solving problems of determining the relationships between the properties of organisms and the properties of the corresponding genetic sequences, we proposed a classification of genetic sequences based on numerical indicators of recurrent and Z-recurrent shapes, which define the structure of functional relationships of elements in sequences. For numerical indicators of recurrent and Z-recurrent shapes, we introduce a method of classification of genetic sequences. We compared a numerical characteristic that generalizes numerical values with a numerical characteristic of recurrent or Z-recurrent shapes which determine the structure of a sequence for each sequence of a biological rank considered in the recognition problem, which has a meaningful in-terpretation in the application area. The problem of recognition is considered from two points of view: when we determine belonging of a sequence to a specific rank of sequences, and when we determine which group of sequences contains the experimental sequence. Basic mathematical difficulties in solving these recognition problems are associated with the search difference in numerical representation of recurrent and Z-recurrent shapes of experimental sequences. To overcome these difficulties we created a spectrum of numerical indicators of recurrent and Z-recurrent shapes. Classification and recognition of sequences are illustrated by an example with three ranks of genetic codes of organisms, each of them represented by 5 sequences. Z-recurrent shape is introduced to define and extend the classification of sequences and increase the efficiency of recognition methods.

  1. Tverdokhlebov V. A. Geometric Shape Automaton Mappings, Recurrent and Z-recurrent Definition Sequences. Izv. Saratov Univ. (N.S.), Ser. Math. Mech. Inform., 2016, vol. 16, iss. 2, pp. 232–241 (in Russian). DOI: https://doi.org/10.18500/1816-9791-2016-16-2-232-241
  2. Tverdokhlebov V. A. Z-recurrent definition sequences in the tasks of monitoring and diagnosing processes in systems. Reports of the Academy of Military Sciences, 2016, no. 2 (70), pp. 43–47 (in Russian).
  3. Kariakin D. A. Analysis of genetic codes by indicators interposition of nucleotides. In: Komp’yuternye nauki i informatsionnye tekhnologii [Computer Science and Information Technology: Proc. Int. Sci. Conf.]. Saratov, Publ. Center “Nauka”, 2016, pp. 190–193 (in Russian).
  4. Lewin B. Geny [Genes]. Moscow, BINOM, Laboratoriya znanij Publ., 2011. 896 p. (in Russian).
  5. Watson D. Dvojnaya spiral’. Vospominaniya ob otkrytii struktury DNK [Double helix. Memories of the discovery of the structure of DNA]. Moscow. Mir, 1969. 152 p. (in Russian).
  6. Hogeweg P. The Roots of Bioinformatics in Theoretical Biology. PLoS. Computational Biology, 2011, vol. 7, iss. 3, art. ID e1002021. DOI: https://doi.org/10.1371/journal.pcbi.1002021
  7. Wattam A. R., Abraham D., Dalay O., Disz T. L., Driscoll T., Gabbard J. L., Gillespie J. J., Gough R., Hix D., Kenyon R., Machi D., Mao C., Nordberg E. K., Olson R., Overbeek R., Pusch G. D., Shukla M., Schulman J., Stevens R. L., Sullivan D. E., Vonstein V., Warren A., Will R., Wilson M. J., Yoo H. S., Zhang C., Zhang Y., Sobral B. W. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res., 2014, vol. 42, iss. D1, pp. D581–D591. DOI: https://doi.org/10.1093/nar/gkt1099
  8. Barnett D. W., Garrison E. K., Quinlan A. R., Stromberg M. P., Marth G. T. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 2011, vol. 27, iss. 12, pp. 1691–1692. DOI: https://doi.org/10.1093/bioinformatics/btr174
  9. Plieskatt J., Rinaldi G., Brindley P. J., Jia X., Potriquet J., Bethony J., Mulvenna J. Bioclojure: a functional library for the manipulation of biological sequences. Bioinformatics, 2014, vol. 30, iss. 17, pp. 2537–2539. DOI: https://doi.org/10.1093/bioinformatics/btu311
  10. Goto N., Prins P., Nakao M., Bonnal R., Aerts J., Katayama T. BioRuby: bioinformatics software for the Ruby programming language. Bioinformatics, 2010, vol. 26, iss. 20, pp. 2617–2619. DOI: https://doi.org/10.1093/bioinformatics/btq475
  11. de Brevern A. G., Meyniel J. P., Fairhead C., Neuvéglise C., Malpertuy A. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies. BioMed Research International, vol. 2015, art. ID 904541, 15 p. DOI: http://dx.doi.org/10.1155/2015/904541
  12. Schuster S. C. Next-generation sequencing transforms today’s biology. Nature Methods, 2008, vol. 5, iss. 1, pp. 16–18. DOI: https://doi.org/10.1038/nmeth1156
  13. Singer M., Berg P. Geny i genomy [Genes and genomes]. Moscow, Mir, 1998. 391 p. (in Russian).
  14. Berg J. M., Tymoczko J. L., Stryer L. DNA, RNA, and the Flow of Genetic Information. In: Berg J. M., Tymoczko J. L., Stryer L. Biochemistry. 5th. ed. New York, W. H. Freeman and Company, 2002. 1515 p.
  15. NCBI Genome List. Available at: http://www.ncbi.nlm.nih.gov/genome/browse/ (accessed 18 Desember 2017).