Alignment algorithms
Online demos and tools
TeachEnG - educational tool for aslgnment algorithms, plylogenetic tree. - Kim, Minji, Yeonsung Kim, Lei Qian, and Jun S. Song. “TeachEnG: A Teaching Engine for Genomics.” Bioinformatics, October 15, 2017
textdistance - compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage
Alignment algorithms
Needleman, S. B., and C. D. Wunsch. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.” Journal of Molecular Biology, March 1970
Smith, T. F., and M. S. Waterman. “Identification of Common Molecular Subsequences.” Journal of Molecular Biology, March 25, 1981
Burrows, Michael, and David J Wheeler. “A Block-Sorting Lossless Data Compression Algorithm,” 1994. - BWT paper
Ferragina, Paolo, and Giovanni Manzini. “Opportunistic Data Structures with Applications.” In Foundations of Computer Science, IEEE, 2000 - FM index paper
Li, Heng, and Richard Durbin. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics, July 15, 2009 - BWA first paper
Li, Heng, and Richard Durbin. “Fast and Accurate Long-Read Alignment with Burrows-Wheeler Transform.” Bioinformatics, March 1, 2010 - BWA second paper
Li, Heng. “Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM.” ArXiv, 2013. - BWA-MEM (maximal exact matches). Automatically select end-to-end or gapped alignment.
Langmead, Ben, and Steven L Salzberg. “Fast Gapped-Read Alignment with Bowtie 2.” Nature Methods, March 4, 2012 - Bowtie2 gapped alignment
Dobin, Alexander, Carrie A. Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson, and Thomas R. Gingeras. “STAR: Ultrafast Universal RNA-Seq Aligner.” Bioinformatics, January 1, 2013 - STAR gapped aligner. Algorithm, testing on simulated dataset.
Kim, Daehwan, Ben Langmead, and Steven L. Salzberg. “HISAT: A Fast Spliced Aligner with Low Memory Requirements.” Nature Methods, April 2015 - HISAT aligner. Two indexes - one large with many small
Fonseca, Nuno A., Johan Rung, Alvis Brazma, and John C. Marioni. “Tools for Mapping High-Throughput Sequencing Data.” Bioinformatics, December 15, 2012 - Sequencing mappers, lists and characteristics of DNA, RNA, bisulfite aligners.
References
Nagarajan, Niranjan, and Mihai Pop. “Sequence Assembly Demystified.” Nature Reviews. Genetics, March 2013 - Gentle introduction into genome assembly. Technologies. Box2: Greedy, overlap-layout-consensus, De Bruijn. Problems
Pevzner, P. A., H. Tang, and M. S. Waterman. “An Eulerian Path Approach to DNA Fragment Assembly.” Proceedings of the National Academy of Sciences, August 14, 2001 - First de Bruijn graph for genome assembly paper. Idea of breaking reads into fragments. Typical approach reads are vertices connected by edges if they overlap. Hamiltonian path problem - visit each vertex exactly once, NP-complete. de Bruijn graph - overlapping fragments are edges, and the problem is Eulerian path - visit each edge once. Error-correction algorithm.
Compeau, Phillip E C, Pavel A Pevzner, and Glenn Tesler. “How to Apply de Bruijn Graphs to Genome Assembly.” Nature Biotechnology, November 8, 2011
Chaisson, Mark J. P., Richard K. Wilson, and Evan E. Eichler. “Genetic Variation and the de Novo Assembly of Human Genomes.” Nature Reviews Genetics, October 7, 2015 - Genome assembling strategies, problems. OLC, De Bruijn, string graphs. Types of gaps.
Miller, Jason R., Sergey Koren, and Granger Sutton. “Assembly Algorithms for Next-Generation Sequencing Data.” Genomics, June 2010 - Assembly tools for overlap/layout/consensus and the de Bruijn graph approaches. de Bruin graph Issues with genome assembly, potential solutions.
String Graph Assembler. Simpson, J. T., and R. Durbin. “Efficient de Novo Assembly of Large Genomes Using Compressed Data Structures.” Genome Research, March 1, 2012 - SGA - String Graph Assembler. From an FM-index. Velvet, ABySS, SOAPdenovo de Bruijn graph assemblers. BWA and FM explanation
Koren, Sergey, and Adam M. Phillippy. “One Chromosome, One Contig: Complete Microbial Genomes from Long-Read Sequencing and Assembly.” Current Opinion in Microbiology, February 2015 - Genome assembly overview focusing on long reads. Repeats (global and local) are problematic. Details on technologies: PacBio RS, Illumina’s Moleculo, ONT MinION. Assembling approaches: OLC, hierarchical hybrid (long reads correction using another technology) and non-hybrid (self long reads alignment-correction). Assembly augmentation: gap filling, scaffolding, read threading. Table 1 - long read assembly tools and descriptions.
Videos
Algorithms for DNA Sequencing - 55 short videos by Ben Langmead
MIT 7.91J Foundations of Computational and Systems - full lecture videos (approx 1h 20m), by Christopher Burge, David Gifford, Ernest Fraenkel
Vector illustrations of BWT based searching on small strings, https://github.com/ekg/drawbwt