Genomic variation
Single Nucleotide Polymorphisms
Koboldt, Daniel C. “Best Practices for Variant Calling in Clinical Sequencing.” Genome Medicine, (December 2020) - Introduction in genomic variant calling, panel/exome/whole genome sequencing technologies (Table 1), preprocessing, analysis (SNVs/indels, mutations, CNVs, SVs, gene fusions, Table 2), gold standard datasets (GIAB), best practices, filtering for each type of genomic variant.
1000 Genomes Project Consortium, Adam Auton, Lisa D. Brooks, Richard M. Durbin, Erik P. Garrison, Hyun Min Kang, Jan O. Korbel, et al. “A Global Reference for Human Genetic Variation.” Nature 526, no. 7571 (October 1, 2015)
Bamshad, Michael J., Sarah B. Ng, Abigail W. Bigham, Holly K. Tabor, Mary J. Emond, Deborah A. Nickerson, and Jay Shendure. “Exome Sequencing as a Tool for Mendelian Disease Gene Discovery.” Nature Reviews. Genetics 12, no. 11 (September 27, 2011) - Exome sequencing technology, limitations, use for diagnostics, family studies.
McCarthy, Davis J, Peter Humburg, Alexander Kanapin, Manuel A Rivas, Kyle Gaulton, asds, Jean-Baptiste Cazier, and Peter Donnelly. “Choice of Transcripts and Software Has a Large Effect on Variant Annotation.” Genome Medicine, (2014) - SNP annotation depends on transcripts and software. Types of SNPs. Ambigious annotations
Pabinger, S., A. Dander, M. Fischer, R. Snajder, M. Sperk, M. Efremova, B. Krabichler, M. R. Speicher, J. Zschocke, and Z. Trajanoski. “A Survey of Tools for Variant Analysis of Next-Generation Genome Sequencing Data.” Briefings in Bioinformatics, (March 1, 2014) - SNP calling and analysis tools overview. Germline, somatic, CNV, SV detection. Variant annotation tools.
MacArthur, D. G., T. A. Manolio, D. P. Dimmock, H. L. Rehm, J. Shendure, G. R. Abecasis, D. R. Adams, et al. “Guidelines for Investigating Causality of Sequence Variants in Human Disease.” Nature, (April 24, 2014) - Definitions and guidelines to define pathogenicity of SNPs
Structural and Copy Number Variants
Liu, Biao, Jeffrey M. Conroy, Carl D. Morrison, Adekunle O. Odunsi, Maochun Qin, Lei Wei, Donald L. Trump, Candace S. Johnson, Song Liu, and Jianmin Wang. “Structural Variation Discovery in the Cancer Genome Using next Generation Sequencing: Computational Solutions and Perspectives.” Oncotarget 6, no. 8 (March 20, 2015) - Structural variants and tools. Six SV types: deletion, insertion, tandem duplication, inversion, intra- and interchromosomal translocations (Figure 1). Detection from paired-end sequencing signatures: discordant read-pairs, splitting reads (Figure 2, Tools in Table 1). Description of each tool.
Quinlan, Aaron R., and Ira M. Hall. “Characterizing Complex Structural Variation in Germline and Somatic Genomes.” Trends in Genetics: TIG 28, no. 1 (January 2012) - SV review, types, how generated, technologies for detection (Box 1. depth, paired-end, split-read)
Alkan, Can, Bradley P. Coe, and Evan E. Eichler. “Genome Structural Variation Discovery and Genotyping.” Nature Reviews Genetics 12, no. 5 (May 2011) - CNV, structural detection review
Zhang, Feng, Wenli Gu, Matthew E. Hurles, and James R. Lupski. “Copy Number Variation in Human Health, Disease, and Evolution.” Annual Review of Genomics and Human Genetics 10 (2009) - CNV review, mechanisms, analytical difficulties, roles in individual diseases
Trost, Brett, Susan Walker, Zhuozhi Wang, Bhooma Thiruvahindrapuram, Jeffrey R. MacDonald, Wilson W. L. Sung, Sergio L. Pereira, et al. “A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data.” American Journal of Human Genetics 102, no. 1 (January 4, 2018) - CNV (>1kb) read-depth detection workflow, from experimental considerations to computational analysis. HuRef (NA12878) genome, supplemental files contain CNV genomic coordinates. CNVnator and ERDS perform optimally. Tools comparison, links to resources.
GWAS
- Bush, William S., and Jason H. Moore. “Chapter 11: Genome-Wide Association Studies.” PLoS Computational Biology, (December 27, 2012) - GWAS key concepts review. Examples of GWAS findings. Definitions of allele frequency, linkage disequilibrium and D’ and r^2 definitions, technology, study design, association tests using GLM, ANOVA, logistic regression frameworks, population stratification, meta-analysis considerations, data phasing, imputation. Tools.
Tools
Xu, Chang. “A Review of Somatic Single Nucleotide Variant Calling Algorithms for Next-Generation Sequencing Data.” Computational and Structural Biotechnology Journal 16 (2018) - Overview of 46 somatic Single Nucleotide Variant (SNV) caller tools. Pre-processing, variant evaluation, and post-filtering steps. Four categories of algorithms, description of each, and the corresponding tools: matched tumor-normal (position-, haplotype-, mathine learning-based methods, Table 1), single-sample (Table 1, 2, some offer somatic-germline classification), UMI-based (UMI technology, Figure 1, Table 3), and RNA-seq (Technology, issues, Table 4) variant calling. Benchmarking using tools for generating synthetic reads, spike-ins, GiAB, melanoma-normal samples, performance evaluation metrics. Issues in representing complex variants and tools for variant normalization. Deep neural network-based algorithms perform best.
Pirooznia, Mehdi, Melissa Kramer, Jennifer Parla, Fernando S. Goes, James B. Potash, W. Richard McCombie, and Peter P. Zandi. “Validation and Assessment of Variant Calling Pipelines for Next-Generation Sequencing.” Human Genomics 8 (July 30, 2014) - SNP pipeline benchmarking, GATK vs. samtools. GATK is the best. Supplementary - actual commands to run. Whole Exome Sequencing Analysis Pipeline
Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel A. R. Ferreira, David Bender, Julian Maller, et al. “PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses.” American Journal of Human Genetics 81, no. 3 (September 2007) - PLINK - a tool for whole-genome association studies data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. Details of each task. gPLINK - graphical user interface integrated with HaploView. PLINK website
Shabalin, Andrey A. “Matrix EQTL: Ultra Fast EQTL Analysis via Large Matrix Operations.” Bioinformatics 28, no. 10 (May 15, 2012) - eQTL detection using linear regression/ANOVA models. Genotype by gene expression matrix multiplication to calculate model statistics. Handling of covariates, correlation structure, FDR correction, handling of cis/trans qtls. Matrix eQTL website
Workflows and tutorials
MareesAT/GWA_tutorial - A comprehensive tutorial about GWAS and PRS
Van der Auwera, Geraldine A., Mauricio O. Carneiro, Chris Hartl, Ryan Poplin, Guillermo Del Angel, Ami Levy-Moonshine, Tadeusz Jordan, et al. “From FastQ Data to High Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline.” Current Protocols in Bioinformatics 43 (2013)
Reed, Eric, Sara Nunez, David Kulp, Jing Qian, Muredach P. Reilly, and Andrea S. Foulkes. “A Guide to Genome-Wide Association Analysis and Post-Analytic Interrogation.” Statistics in Medicine 34, no. 28 (December 10, 2015) - GWAS R tutorial. Workflow details, file types, filtering steps, PCA, post-analysis visualization.
Notes on whole exome and whole genome sequencing analysis by Ming Tang
Thousand Variant Callers Project Github Repo, links and short descriptions of different genomic variant callers. https://github.com/deaconjs/ThousandVariantCallersRepo
“Wrangling genomics” SNP calling pipeline by DataCarpentry
Variant Annotation Workshop with FunciVAR, StateHub and MotifBreakR
Basic walk-throughs for alignment and variant calling from NGS sequencing data, by Erik Garrison.