Alignment algorithms

Online demos and tools

Alignment algorithms

References

  • Nagarajan, Niranjan, and Mihai Pop. “Sequence Assembly Demystified.” Nature Reviews. Genetics, March 2013 - Gentle introduction into genome assembly. Technologies. Box2: Greedy, overlap-layout-consensus, De Bruijn. Problems

  • Pevzner, P. A., H. Tang, and M. S. Waterman. “An Eulerian Path Approach to DNA Fragment Assembly.” Proceedings of the National Academy of Sciences, August 14, 2001 - First de Bruijn graph for genome assembly paper. Idea of breaking reads into fragments. Typical approach reads are vertices connected by edges if they overlap. Hamiltonian path problem - visit each vertex exactly once, NP-complete. de Bruijn graph - overlapping fragments are edges, and the problem is Eulerian path - visit each edge once. Error-correction algorithm.

  • Compeau, Phillip E C, Pavel A Pevzner, and Glenn Tesler. “How to Apply de Bruijn Graphs to Genome Assembly.” Nature Biotechnology, November 8, 2011

  • Chaisson, Mark J. P., Richard K. Wilson, and Evan E. Eichler. “Genetic Variation and the de Novo Assembly of Human Genomes.” Nature Reviews Genetics, October 7, 2015 - Genome assembling strategies, problems. OLC, De Bruijn, string graphs. Types of gaps.

  • Miller, Jason R., Sergey Koren, and Granger Sutton. “Assembly Algorithms for Next-Generation Sequencing Data.” Genomics, June 2010 - Assembly tools for overlap/layout/consensus and the de Bruijn graph approaches. de Bruin graph Issues with genome assembly, potential solutions.

  • String Graph Assembler. Simpson, J. T., and R. Durbin. “Efficient de Novo Assembly of Large Genomes Using Compressed Data Structures.” Genome Research, March 1, 2012 - SGA - String Graph Assembler. From an FM-index. Velvet, ABySS, SOAPdenovo de Bruijn graph assemblers. BWA and FM explanation

  • Koren, Sergey, and Adam M. Phillippy. “One Chromosome, One Contig: Complete Microbial Genomes from Long-Read Sequencing and Assembly.” Current Opinion in Microbiology, February 2015 - Genome assembly overview focusing on long reads. Repeats (global and local) are problematic. Details on technologies: PacBio RS, Illumina’s Moleculo, ONT MinION. Assembling approaches: OLC, hierarchical hybrid (long reads correction using another technology) and non-hybrid (self long reads alignment-correction). Assembly augmentation: gap filling, scaffolding, read threading. Table 1 - long read assembly tools and descriptions.