Izvestiya of Saratov University.

Mathematics. Mechanics. Informatics

ISSN 1816-9791 (Print)
ISSN 2541-9005 (Online)


For citation:

Sergushichev A. A., Aleksandrov A. V., Kazakov S. V., Tsarev F. N., Shalyto A. A. Combining de Bruijn graphs, overlap graphs and microassembly for de novo genome assembly. Izvestiya of Saratov University. Mathematics. Mechanics. Informatics, 2013, vol. 13, iss. 2, pp. 51-57. DOI: 10.18500/1816-9791-2013-13-2-2-51-57, EDN: RHABJZ

This is an open access article distributed under the terms of Creative Commons Attribution 4.0 International License (CC-BY 4.0).
Published online: 
25.05.2013
Full text:
(downloads: 162)
Language: 
Russian
Heading: 
UDC: 
004.021
EDN: 
RHABJZ

Combining de Bruijn graphs, overlap graphs and microassembly for de novo genome assembly

Autors: 
Sergushichev Aleksei Aleksandrovich, Saint-Petersburg National Research University of Information Technologies
Aleksandrov Anton Vyacheslavovich, Saint-Petersburg National Research University of Information Technologies
Kazakov Sergei Vladimirovich, Saint-Petersburg National Research University of Information Technologies
Tsarev Fedor Nikolayevich, Saint-Petersburg National Research University of Information Technologies
Shalyto Anatoly Abramovich, Saint-Petersburg National Research University of Information Technologies
Abstract: 

 In this paper we present a method for de novo genome assembly that splits the process into three stages: quasicontig assembly; contig assembly from quasicontigs; contig postprocessing with microassembly. The first stage uses de Bruijn graph, the second one uses overlap graph. We have carried out experiments of assembling the E. Coli genome (size ≈4.5 Mbp) andMaylandia zebra genome (size ≈1 Gbp). Advantage of proposed method is a low memory consumption. 

References: 
  1. Illumina, Inc. Available at: http://www.illumina.com/ (Accessed 18, May, 2012).
  2. B¨ockenhauer H.-J., Bongrratz D. Algorithmic Aspects of Bioinformatics. Springer, 2007, 396 p.
  3. Pevzner P. A. 1-Tuple DNA sequencing: computer analysis. J. Biomol. Struct. Dyn. 1989. vol. 7, pp. 63–73.
  4. Zerbino D. R., Birney E. Velvet : Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 2008, vol. 18, pp. 821–829.
  5. Butler J., MacCallum I., Kleber M., Shlyakhter I. A., Belmonte M. K., Lander E. S., Nusbaum C., Jaffe D. B. ALLPATHS : De novo assembly of wholegenome shotgun microreads, Genome Research, 2008, vol. 18, pp. 810– 820.
  6. Simpson J. T., Wong K., Jackman S. D., Schein J. E., Jones S. J., Birol I. ABySS : A parallel assembler for short read sequence data. Genome Research, 2009, vol. 19, pp. 1117–1123.
  7. Li R., Zhu H., Ruan J., Qian W., Fang X., Shi Z., Li Y., Li S., Shan G., Kristiansen K., Li S., Yang H., Wang J., Wanget J. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 2010, vol. 20, pp. 265–272.
  8. Pevzner P. A., Tang H., Waterman M. S. EULER : An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci., 2001, no. 98, pp. 9748–9753.
  9. Aleksandrov A. V., Kazakov S. V., Melnikov S. V., Sergushichev A. A., Tsarev F. N., Shalyto A. A. Errors Correction Method in the Readings Set of Nucleotide Sequence. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2011, no. 5, pp. 81–84 (in Russian).
  10. Okanohara D., Sadakane K. Practical entropy-compressed rank/select dictionary. Comput. Research Reposit., 2006. Available at: http://arxiv.org/abs/cs/0610001 (Accessed 18, May, 2012).
  11. Chikhi R., Rizk G. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms in Bioinformatics, 2012, pp. 236–248.
  12. Gusfield D. Algorithms on String, Trees and Sequences. Computer Science and Computational Biology. Cambridge Univ. Press, 1997, 554 p. (Rus. ed.: Gusfild D. Stroki, derev’ia i posledovatel’nosti v algoritmakh. Informatika i vychislitel’naia biologiia. St. Petersburg, Nevskii dialekt Publ., 2003, 656 p.). 13. The Assemblathon. Available at: http://www.assemblathon.org (Accessed 18, May, 2012).
Received: 
17.11.2012
Accepted: 
26.04.2013
Published: 
31.05.2013
Short text (in English):
(downloads: 78)