+ - 0:00:00
Notes for current slide
Notes for next slide

Needleman-Wunsch global alignment

Mikhail Dozmorov

Virginia Commonwealth University

02-17-2021

1 / 10

Alignment goals

  • Homology - sequence similarity - helps to infer functions of genes uncharacterized in one organism but known in another

  • Global sequence alighment (Needleman-Wunch)

  • Gapped local sequence alignment (Smith-Waterman)

2 / 10

Needleman-Wunsch algorithm

Needleman, S. B., and C. D. Wunsch. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins90057-4).” Journal of Molecular Biology, (March 1970)

3 / 10

Dynamic programming

  • The Needleman-Wunsch algorithm is an example of dynamic programming, a discipline invented by Richard E. Bellman in 1953

  • Dynamic programming - solving complex problems by breaking them down into simpler subproblems, and remember subproblem solutions

    • Guaranteed to provide the optimal alignment for a given set of scoring functions
    • Slow due to the very large number of computational steps \(O(n^2)\)
4 / 10

Needleman-Wunsch algorithm

  • In its simplest form, assign a value \(A\) where the aligned pair consists of the same letters (nucleotides, amino acids)
    • If the letters differ, subtract \(B\) - edit penalty.
    • If a gap needs to be made, subtract a gap penalty times the number of gaps
5 / 10

Steps

  1. Initialization of the score matrix \(D(i,j)\)
  2. Calculation of scores, such that

$$ D(i,j) = max \begin{cases} D(i-1,j-1) + s(i,j) \\ D(i-1,j) + g \\ D(i,j-1) + g \end{cases} $$

Where \(s(i,j)\) is the substitution score for entries \(i\) and \(j\), and \(g\) is the gap penalty

Once you construct the matrix, you can trace back the best score for an alignment

6 / 10

Best possible alignment for two sequences

  • The N-W algorithm is mathematically proven to find the best alignment of two sequences

  • N-W algorithm takes \(O(n^2)\) to find best alignment of \(n\) letters in two sequences

  • Accessing all possible alignments one by one would date \(\binom{2n}{n}\), so \(n^2\) is much smaller

7 / 10

Sequence searching and alignment

  • FASTA - a DNA and protein sequence alignment software package by David J. Lipman and William R. Pearson in 1985.

  • BLAST (Basic Local Alignment Search Tool) - an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA sequences.

    • Designed by Stephen Altschul, Warren Gish, Webb Miller, Eugene Myers, and David J. Lipman
    • Innovation: heuristic database search (speed), followed by optimal alignment (accuracy, statistics)
8 / 10

BLAST

  • BLAST is not used for NGS because it is too slow.

  • Format for command line version: blastall -d assemblyfasta -i genefasta -o output.blast -p blastn -e 1e-15

    • -i indicates what is the gene file
    • -o indicates what you want the output to be
    • -p with ending "n" means nucleotide alignment -e statistical significance of alignments
  • Magic-BLAST is an alternative

https://ncbi.github.io/magicblast/

https://ncbiinsights.ncbi.nlm.nih.gov/2016/10/13/introducing-magic-blast/

9 / 10

BWA, Bowtie aligners

Bowtie employs Burrows-Wheeler transform (BWT) based on the full-text minute-space (FM) index. Index is built using bwa index, bowtie2-build

acaacg$ → $acaacg →(sort) $acaacg → gc$aaac
g$acaac aacg$ac
cg$acaa acaacg$
acg$aca acg$aca
aacg$ac caacg$a
caacg$a cg$acaa
acaacg$ g$acaac
  • Bowtie has a memory footprint of 1.3 GB for the human genome. Very fast.
  • The last first mapping can transform it back to the original sequence.
10 / 10

Alignment goals

  • Homology - sequence similarity - helps to infer functions of genes uncharacterized in one organism but known in another

  • Global sequence alighment (Needleman-Wunch)

  • Gapped local sequence alignment (Smith-Waterman)

2 / 10
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow