Highly expressed genes positively correlated with:
The opposite is true for lowly expressed genes
Chromosome 19 is the most gene dense chromosome in the human genome
Retrotransposons - fossil records of evolution
A typical human genome differs from the reference genome at 4.1 to 5.0 million sites - Single Nucleotide Polymorphisms (SNPs)
Determine the "complete" sequence of a human haploid genome
Identify the sequence and location of every protein coding gene
Use as a "map" with which to track the location and frequency of genetic variation in the human genome
Unravel the genetic architecture of inherited and somatic human diseases
Understand genome and species evolution
Sequencing by synthesis (not degradation)
Radioactive primers hybridize to DNA
Polymerase + dNTPs (normal dNTPs) + ddNTP (dideoxynucleotides terminators) at low concentration
1 lane per base, visually interpret ladder
"Massively parallel" sequencing
"High-throughput" sequencing
"Ultra high-throughput" sequencing
"Next generation" sequencing (NGS)
"Second generation" sequencing
2005: 454 (Roche)
2006: Solexa (Illumina)
2007: ABI/SOLiD (Life Technologies)
2010: Complete Genomics
2011: Pacific Biosciences
2010: Ion Torrent (Life Technologies)
2015: Oxford Nanopore Technologies
Cut the long DNA into smaller segments (several hundreds to several thousand bases)
Sequence each segment: start from one end and sequence along the chain, base by base
The process stops after a while because the noise level is too high
Results from sequencing are many sequence pieces. The lengths vary, usually a few thousands from Sanger, and several hundreds from NGS
The sequence pieces are called "reads" for NGS data
PCR amplify DNA fragments
Immobilize fragments on a solid surface, amplify
Reversible terminator sequencing with 4 color dye-labelled nucleotides
Video of Illumina sequencing, http://www.youtube.com/watch?v=77r5p8IBwJk (1.5m), https://www.youtube.com/watch?v=fCd6B5HRaZ8 (5m)
Advantages:
Disadvantages:
Single-end sequencing: sequence one end of the DNA segment.
Paired-end sequencing: sequence both ends of a DNA segments.
NGS has a wide range of applications
DNA-seq: sequence genomic DNA
RNA-seq: sequence RNA products
ChIP-seq: detect protein-DNA interaction sites
Bisulfite sequencing (BS-seq): measure DNA methylation strengths
A lot of others
Basically replaced microarrays with better data: greater dynamic range and higher signal-to-noise ratios.
Sequence the untreated genomic DNA.
Goals: Compare with the reference genome and look for genetic variants
Single nucleotide polymorphisms (SNPs)
Targeted sequencing, e.g., exome sequencing
Metagenomic sequencing
Sequence the "transcriptome": the set of RNA molecules
Goals
Determine transcriptional structures: alternative splicing, gene fusion, etc.
Quantify gene expression: the sequencing version of gene expression microarray
Chromatin-Immunoprecipitation (ChIP) followed by sequencing (seq): sequencing version of ChIP-chip
Used to detect locations of certain "events" on the genome:
A type of "captured" sequencing. ChIP step is to capture genomic regions of interest
~21,000 protein coding genes
PolyA+
PolyA-
Most (62%) of the genome is transcribed
~12,000 pseudogenes – results of duplications
~10,000 lncRNA = noncoding RNAs >200bp
~9000 small RNAs - many of the lncRNA transcripts are processed into stable small RNAs
~82,000 – 128,000 transcription start sites - depending on detection method
~5,000 RNA edits occur post transcription
https://www.forbes.com/forbes/2009/1005/revolutionaries-science-genomics-gene-machine.html
Key Points:
Caveats:
Nanonet
, Albacore
, Scrappie
poretools
- a toolkit for analyzing nanopore sequence data.Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |