Started December 13, 2005, phase II in 2009, ended in 2014
Mission - to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.
Data generation
Biospecimen Core Resource (BCR) - Collect and process tissue samples
Genome Sequencing Centers (GSCs) - Use high-throughput Genome Sequencing to identify the changes in DNA sequences in cancer
Genome Characterization Centers (GCCs) - Analyze genomic and epigenomic changes involved in cancer
Data Coordinating Center (DCC) - The TCGA data are centrally managed at the DCC
Genome Data Analysis Centers (GDACs) - These centers provide informatics tools to facilitate broader use of TCGA data
An access control policy is in place for TCGA data to ensure that personally identifiable information is kept from unauthorized users
Open access - Houses data that cannot be aggregated to generate a data set unique to an individual. This tier does not require user certification for data access
Controlled access - Houses individually-unique information that could potentially be used to identify an individual. This tier requires user certification for data access
Access to controlled data is available to researchers who:
Agree to restrict their use of the information to biomedical research purposes only
Agree with the statements within TCGA Data Use Certification (DUC)
Have their institutions certifiably agree to the statements within TCGA DUC
Complete the Data Access Request (DAR) form and submit it to the Data Access Committee to be a TCGA Approved User. This form is available electronically through dbGaP
https://wiki.nci.nih.gov/display/TCGA/TCGA+Home
TCGA-AO-A128
7eea2b6e-771f-44c0-9350-38f45c8dbe87
, which are bound to filenameshttps://wiki.nci.nih.gov/display/TCGA/TCGA+barcode
genefu
R package for PAM50 classification and survival analysis. https://www.bioconductor.org/packages/release/bioc/html/genefu.html
Parker, Joel S., Michael Mullins, Maggie C. U. Cheang, Samuel Leung, David Voduc, Tammi Vickery, Sherri Davies, et al. “Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes.” Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology, (March 10, 2009)
Standardized, analysis-ready TCGA datasets
Standardized analyses upon them
http://gdac.broadinstitute.org/
fbget - Python application programming interface (API) with >27 functions for Sample-level data, Firehose analyses, Standard data archives, Metadata access
firehose_get
FirebrowseR - An R client for broads firehose pipeline, providing TCGA data sets
web-TCGA - a shiny app to access TCGA data from Firebrowse
Launched on June 6, 2016. Provides standardized genomic and clinical data
Therapeutically Applicable Research To Generate Effective Treatments (TARGET) - A comprehensive genomic approach to determine molecular changes that drive childhood cancers. (AML and Neuroblastoma)
Cancer Cell Line Encyclopedia (CCLE) - Genome-wide information of ~1000 cell lines under baseline condition. Pharmacologic response profiles (IC50) and mutation status analysis
Stand Up To Cancer (SU2C) - 50 Breast cancer cell lines. GI50 to 77 therapeutic compounds
Connectivity Map - 4 cell lines and 1309 perturbagens at several concentrations. Gene expression change after treatment
The GDC Application Programming Interface (API)
GenomicDataCommons
- GDC access in R
https://docs.gdc.cancer.gov/API/Users_Guide/Getting_Started/#api-endpoints
https://bioconductor.org/packages/GenomicDataCommons/
Rich set of tools for visualization, analysis and download of large-scale cancer genomics data sets.
The Onco Query Language (OQL) to fine-tune queries
http://www.cbioportal.org/index.do
http://www.cbioportal.org/tutorial.jsp - short tutorials
Gao, Jianjiong, Bülent Arman Aksoy, Ugur Dogrusoz, Gideon Dresdner, Benjamin Gross, S. Onur Sumer, Yichao Sun, et al. “Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the CBioPortal.” Science Signaling, (April 2, 2013)
REST-based web API
CGDS-R
package provides a basic set of functions for querying the Cancer Genomic Data Server (CGDS)
MATLAB CGDS Cancer Genomics Toolbox
- data access functionality in the MATLAB environment
http://www.cbioportal.org/web_api.jsp
http://www.cbioportal.org/cgds_r.jsp
https://cran.r-project.org/web/packages/cgdsr/vignettes/cgdsr.pdf
curatedTCGAData
- Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
HarmonizedTCGAData
- Processed Harmonized TCGA Data of Five Selected Cancer Types
https://bioconductor.org/packages/curatedTCGAData/
https://bioconductor.org/packages/HarmonizedTCGAData/
curatedOvarianData
curatedCRCData
(colorectal)
curatedBladderData
TCGAbiolinks
- an R package for integrative analysis of TCGA data
https://bioconductor.org/packages/TCGAbiolinks/
Colaprico, Antonio, Tiago C. Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S. Sabedot, et al. “TCGAbiolinks: An R/Bioconductor Package for Integrative Analysis of TCGA Data.” Nucleic Acids Research, (May 5, 2016)
https://CRAN.R-project.org/package=TCGA2STAT
Former UCSC Cancer Genomics Browser. Now UCSC Xena
Includes TCGA, Cancer Cell Line Encyclopedia, the Stand Up To Cancer (SU2C) Breast Cancer data, custom datasets
A tool to visually explore and analyze cancer genomics data and its associated clinical information.
Gene- and genome-centric view
Survival analysis on user-defined subgroups
https://xenabrowser.net/, https://xenabrowser.net/datapages/, http://xena.ucsc.edu/getting-started/
Cline, Melissa S., Brian Craft, Teresa Swatloski, Mary Goldman, Singer Ma, David Haussler, and Jingchun Zhu. “Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser.” Scientific Reports (October 2, 2013)
Goal - simplify centralized access to TCGA data and provide easy analysis
Three centers were awarded to develop cloud access
http://cgc.systemsbiology.net/
https://software.broadinstitute.org/firecloud/
http://www.cancergenomicscloud.org/
Gonzalez-Perez, Abel, Christian Perez-Llamas, Jordi Deu-Pons, David Tamborero, Michael P Schroeder, Alba Jene-Sanz, Alberto Santos, and Nuria Lopez-Bigas. “IntOGen-Mutations Identifies Cancer Drivers across Tumor Types.” Nature Methods, (September 15, 2013)
The International Cancer Genome Consortium (ICGC)’s Pan-Cancer Analysis of Whole Genomes (PCAWG) project aimed to categorize somatic and germline variations in both coding and non-coding regions in over 2,800 cancer patients
5,789 whole genomes of tumors and matched normal tissue spanning 39 tumor types, RNA-Seq profiles were obtained from a subset of 1,284 of the donors
Similar to other large-scale genome projects, the ICGC has a Data Coordination Center (DCC)
http://icgc.org/, http://dcc.icgc.org/
Started December 13, 2005, phase II in 2009, ended in 2014
Mission - to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.
Data generation
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |