RNA-seq
Lowe, Rohan, Neil Shirley, Mark Bleackley, Stephen Dolan, and Thomas Shafee. “Transcriptomics Technologies.” PLoS Computational Biology, (May 2017)
Conesa, Ana, Pedro Madrigal, Sonia Tarazona, David Gomez-Cabrero, Alejandra Cervera, Andrew McPherson, Michał Wojciech Szcześniak, et al. “A Survey of Best Practices for RNA-Seq Data Analysis.” Genome Biology, (December 2016) - RNA-seq analysis roadmap, QC. Differential detection. TPM. Tools for alternative splicing detection and visualization. small RNA analysis. single cell. Integrative analysis, with methylation.
Garber, Manuel, Manfred G. Grabherr, Mitchell Guttman, and Cole Trapnell. “Computational Methods for Transcriptome Annotation and Quantification Using RNA-Seq.” Nature Methods, (June 2011) - RNA-seq alignment and quantification. Table of tools. Transcriptome reconstruction. Alternative splicing
Wang, Zhong, Mark Gerstein, and Michael Snyder. “RNA-Seq: A Revolutionary Tool for Transcriptomics.” Nature Reviews. Genetics, (January 2009) - RNA-seq review.
Williams, Alexander G., Sean Thomas, Stacia K. Wyman, and Alisha K. Holloway. “RNA-Seq Data: Challenges in and Recommendations for Experimental Design and Analysis: RNA-Seq Data: Experimental Design and Analysis.” In Current Protocols in Human Genetics, John Wiley & Sons, Inc., 2014. - RNA-seq basics, tools, simulations
Altman, Naomi, and Martin Krzywinski. “Points of Significance: Sources of Variation.” Nature Methods, (December 30, 2014)
Martin, Jeffrey A., and Zhong Wang. “Next-Generation Transcriptome Assembly.” Nature Reviews. Genetics, (September 7, 2011) - Transcriptome assembly. Sequencing technologies overview. Reference-based and de novo assembly, combined approach idea. Splice graph, De Bruijn graph.
Marioni, John C., Christopher E. Mason, Shrikant M. Mane, Matthew Stephens, and Yoav Gilad. “RNA-Seq: An Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays.” Genome Research, (September 2008) - Illumina sequencing - microarray comparison. Good agreement. Assessing lane effect with hypergeometric distribution. Likelihood ratio test for differential expression. Chi-squared goodness-of-fit test.
Peixoto, Lucia, Davide Risso, Shane G. Poplawski, Mathieu E. Wimmer, Terence P. Speed, Marcelo A. Wood, and Ted Abel. “How Data Analysis Affects Power, Reproducibility and Biological Insight of RNA-Seq Studies in Complex Datasets.” Nucleic Acids Research, (September 18, 2015) - The importance of RNA-seq normalization and batch effect removal. RUVseq increases power, but the choice of the number of latent variables is important. Tutorial: Steps in RNA-seq data processing, normalization, exploratory data analysis
Wang, Eric T., Rickard Sandberg, Shujun Luo, Irina Khrebtukova, Lu Zhang, Christine Mayr, Stephen F. Kingsmore, Gary P. Schroth, and Christopher B. Burge. “Alternative Isoform Regulation in Human Tissue Transcriptomes.” Nature, (November 27, 2008) - Alternative splicing comparison between tissues. ~94% of genes are alternatively transcribed. Variation in alternative splicing is much more between tissues than between individuals.
Park, Eddie, Zhicheng Pan, Zijun Zhang, Lan Lin, and Yi Xing. “The Expanding Landscape of Alternative Splicing Variation in Human Populations.” The American Journal of Human Genetics, (January 2018) - Alternative splicing, detailed overview
Statistics
Pachter, Lior. “Models for Transcript Quantification from RNA-Seq.” ArXiv, 2011. - RNA-seq quantification statistics, expectation-maximization algorithm
Robinson, Mark D., and Alicia Oshlack. “A Scaling Normalization Method for Differential Expression Analysis of RNA-Seq Data.” Genome Biology, (2010) - TMM normalization method. Problems with library scaling normalization. Well-written intuitive motivating example. MA plot, trimming outliers, weighted (inverse of the variance) M average after discarding 30% of M outliers and lowest 5% of A values.
“How not to perform a differential expression analysis (or science)” blog post by Lior Pachter, about Salmon-kallisto similarities and differences, general references
Robinson, Mark D., and Gordon K. Smyth. “Small-Sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data.” Biostatistics (Oxford, England), (April 2008) - Negative Binomial distribution instead of Poisson. Previous models: binomial, Poisson.
Lun, Aaron T. L., and Gordon K. Smyth. “No Counts, No Variance: Allowing for Loss of Degrees of Freedom When Assessing Biological Variability from RNA-Seq Data.” Statistical Applications in Genetics and Molecular Biology, (April 25, 2017) - Negative impact of genes with zero counts on GLM framework for RNA-seq differential expression analysis. Overdispersion, GLM, quasi-likelihood F-test, adjusting degrees of freedom for zero-count genes.
Law, Charity W, Yunshun Chen, Wei Shi, and Gordon K Smyth. “Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-Seq Read Counts.” Genome Biology, (2014) - voom paper
Love, Michael I, Wolfgang Huber, and Simon Anders. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology, (December 2014) - DESeq2 paper. Problems with fold-change ranking of genes - proposed solution using shrinkage of FCs. Generalized linear model using Negative Binomial distribution. Borrowing information - genes of similar avarage expression have similar dispersion. rlog-transformation. The original DESeq publication
Witten, Daniela M. “Classification and Clustering of Sequencing Data Using a Poisson Model.” The Annals of Applied Statistics, (December 2011) - RNA-seq modeling with Poisson distribution. samples X genes matrix. Derivation of Poisson, negative binomial, using Poisson for linear discriminant analysis and clustering (Poisson dissimilarity).
Patro, Rob, Geet Duggal, Michael I Love, Rafael A Irizarry, and Carl Kingsford. “Salmon Provides Fast and Bias-Aware Quantification of Transcript Expression.” Nature Methods, (March 6, 2017) - Salmon paper. Pseudo-alignment, or using precomputed alignment to tramscriptome. Dual-phase statistical inference procedure and sample-specific bias models that account for sequence-specific, fragment, GC content, and positional biases. Comparison with kallisto and sailfish. Tests on simulated (Polyester, RSEM-sim) and real (GEUVADIS, SEQC) data. Detailed Methods description. COMBINE-lab/Salmon GitHub repo
Jiang, Hui, and Wing Hung Wong. “Statistical Inferences for Isoform Expression in RNA-Seq.” Bioinformatics, (April 15, 2009) - Alternative splicing statistics - Poisson modeling. Problem - most reads are shared by more than one isoform. How to quantify isoform expression from exon counts. Detailed statistical derivations
Young, Matthew D., Matthew J. Wakefield, Gordon K. Smyth, and Alicia Oshlack. “Gene Ontology Analysis for RNA-Seq: Accounting for Selection Bias.” Genome Biology, (2010) - Gene set enrichment analysis accounting for length and expression of transcripts. Instead of random sampling, use of the Wallenius non-central hypergeometric distribution to account for biased sampling. GOseq R package
Law, Charity W., Kathleen Zeglinski, Xueyi Dong, Monther Alhamdoosh, Gordon K. Smyth, and Matthew E. Ritchie. “A Guide to Creating Design Matrices for Gene Expression Experiments.” F1000Research (December 10, 2020) - Design matrices for various experimental designs. Means model or mean-reference model.
Soneson, C, F Marini, F Geier, MI Love, and MB Stadler. “ExploreModelMatrix: Interactive Exploration for Improved Understanding of Design Matrices and Linear Models in R” F1000Research, (June 4, 2020).
Workflows and tools
Introduction to DGE - differential expression analysis by DESeq2, by the teaching team at the Harvard Chan Bioinformatics Core (HBC)
He, Wen, Shanrong Zhao, Chi Zhang, Michael S. Vincent, and Baohong Zhang. “QuickRNASeq: Guide for Pipeline Implementation and for Interactive Results Visualization.” Springer New York, 2018. - Practical RNA-seq tutorial based on QuickRNASeq publication.
Love, Michael I., Simon Anders, Vladislav Kim, and Wolfgang Huber. “RNA-Seq Workflow: Gene-Level Exploratory Analysis and Differential Expression.” F1000Research (November 17, 2016) - RNA-seq workflow. From count import, including tximport, through EDA, DESeq2, batch removal, time course analysis, visualization.
Griffith, Malachi, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, and Obi L. Griffith. “Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud.” PLoS Computational Biology, (August 2015) - RNA-seq technology and analysis introduction. Very interesting are supplementary tables. The full tutorials
Sahraeian, Sayed Mohammad Ebrahim, Marghoob Mohiyuddin, Robert Sebra, Hagen Tilgner, Pegah T. Afshar, Kin Fai Au, Narges Bani Asadi, et al. “Gaining Comprehensive Biological Insight into the Transcriptome by Performing a Broad-Spectrum RNA-Seq Analysis.” Nature Communications, (December 2017) - RNAcocktail - RNA-seq tools benchmarking. All aspects of RNA-seq analysis, structured, Fig 1. Recommended tools Fig 8. RNACocktail - A comprehensive framework for accurate and efficient RNA-Seq analysis, RNA-seq blog: Unleash the power within RNA-seq
Law, Charity W., Monther Alhamdoosh, Shian Su, Gordon K. Smyth, and Matthew E. Ritchie. “RNA-Seq Analysis Is Easy as 1-2-3 with Limma, Glimma and EdgeR.” F1000Research (2016) - Latest Rsubread-limma plus pipeline. The complete R code for RNA-seq analysis tutorial
Pertea, Mihaela, Daehwan Kim, Geo M. Pertea, Jeffrey T. Leek, and Steven L. Salzberg. “Transcript-Level Expression Analysis of RNA-Seq Experiments with HISAT, StringTie and Ballgown.” Nature Protocols, (September 2016) - New Tuxedo suite. Protocol.
“RNA-Seq Methods and Algorithms” 7m video by Harold Pimentel, pseudoalignment, kallisto, sleuth, practical
Zhao, Qi, Yubin Xie, Peng Nie, Rucheng Diao, Licheng Sun, Zhixiang Zuo, and Jian Ren. “IDEA: A Web Server for Interactive Differential Expression Analysis with R Packages,” July 3, 2018. - Differential expression analysis from a matrix of FPKMs and a design matrix. Several methods to detect DEGs (DESeq2, edgeR, NOISeq, PoissonSeq, SAMseq), plots (MA, volcano, heatmap). http://renlab.org:3838/IDEA/
TPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files. The input parameters are the same GTF files used to generate the alignments, and one or multiple input BAM file(s) containing either single-end or paired-end sequencing reads. The TPMCalculator output is comprised of four files per sample reporting the TPM values and raw read counts for genes, transcripts, exons and introns respectively. https://github.com/ncbi/TPMCalculator
RNA-seq resources
- Tools for RNA-seq data analysis -RNAseq analysis notes by Tommy Tang
- University of Oregon’s RNA-seqlopedia, a comprehensive guide to RNA-seq starting with experimental design, going through library prep, sequencing, and data analysis.
- RNA-seq blog, http://www.rna-seqblog.com/, Several blog posts per week on new methods and tools for RNA-seq analysis
- Informatics for RNA-seq, by Griffith lab
Practicals
RNA-seq workflow: gene-level exploratory analysis and differential expression R package
RNA-seq analysis exercise using Galaxy, an example analysis using the Tophat+Cufflinks workflow.
“enrichOmics” - Functional enrichment analysis of high-throughput omics data. From basic ExpressionSet differential and functional enrichment analysis to genomic region enrichment analysis and MultiAssayExperiment demo