stddata__2016_08_25 Samples Report
Overview
Introduction

The Broad GDAC mirrors data from the DCC on a daily basis. Although all data is mirrored, not every sample is ingested into Firehose. There are three main mechanisms that filter samples to ensure that only the most scientifically relevant samples make it into our standard data and analyses runs. These three mechanisms are redactions, replicate filtering, and blacklisting. This report summarizes the data that is ingested into Firehose, describes the three filtering mechanisms, lists those samples that are removed, and gives all available annotations from the DCC's Annotation Manager.

Summary

There were 0 redactions, 1104 replicate aliquots, 0 blacklisted aliquots, and 0 FFPE aliquots. The table below represents the sample counts for those samples that were ingested into firehose after filtering out redactions, replicates, and blacklisted data, and segregating FFPEs.

Table 1.  Get Full Table Summary of TCGA Tumor Data. Click on a tumor type to display a tumor type specific Samples Report.

Cohort BCR CN Clinical MAF mRNA miR
ACC 92 90 92 92 79 80
BLCA 412 412 412 412 408 409
BRCA 1098 1094 1097 1044 1085 1078
CESC 308 295 307 305 304 307
CHOL 51 36 45 51 36 36
COAD 463 450 459 432 456 444
COADREAD 635 614 629 589 622 605
DLBC 58 48 48 48 48 47
ESCA 185 184 185 184 161 184
GBM 617 590 596 396 154 0
GBMLGG 1133 1104 1111 909 665 512
HNSC 528 517 528 510 500 523
KICH 113 66 113 66 65 66
KIPAN 941 886 941 693 883 873
KIRC 537 530 537 339 530 516
KIRP 291 290 291 288 288 291
LAML 200 143 200 149 151 103
LGG 516 514 515 513 511 512
LIHC 377 375 377 375 371 372
LUAD 585 518 522 569 513 513
LUSC 504 503 504 497 501 478
MESO 87 87 87 83 86 87
OV 608 568 587 441 374 489
PAAD 185 184 185 183 177 178
PANGI 1298 1240 1257 1214 1158 1225
PCPG 179 178 179 179 178 179
PRAD 500 495 500 498 495 494
READ 172 164 170 157 166 161
SARC 261 260 261 255 259 259
SKCM 470 368 470 368 367 352
STAD 478 442 443 441 375 436
STES 663 626 628 625 536 620
TGCT 150 134 134 150 150 150
THCA 507 505 507 496 502 506
THYM 124 124 124 123 119 124
UCEC 560 540 548 542 543 538
UCS 57 56 57 57 56 57
UVM 80 80 80 80 80 80
Totals 11353 10840 11160 10323 10088 10049
Results
Sample Heatmaps
TCGA-ACC

Figure 1.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-BLCA

Figure 2.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-BRCA

Figure 3.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-CESC

Figure 4.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-CHOL

Figure 5.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-COAD

Figure 6.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-COADREAD

Figure 7.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-DLBC

Figure 8.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-ESCA

Figure 9.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-GBM

Figure 10.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-GBMLGG

Figure 11.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-HNSC

Figure 12.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KICH

Figure 13.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIPAN

Figure 14.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIRC

Figure 15.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIRP

Figure 16.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LAML

Figure 17.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LGG

Figure 18.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LIHC

Figure 19.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LUAD

Figure 20.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LUSC

Figure 21.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-MESO

Figure 22.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-OV

Figure 23.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PAAD

Figure 24.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PANGI

Figure 25.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PCPG

Figure 26.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PRAD

Figure 27.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-READ

Figure 28.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-SARC

Figure 29.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-SKCM

Figure 30.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-STAD

Figure 31.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-STES

Figure 32.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-TGCT

Figure 33.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-THCA

Figure 34.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-THYM

Figure 35.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UCEC

Figure 36.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UCS

Figure 37.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UVM

Figure 38.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

FFPE Cases
[NOT YET IMPLEMENTED] Additional Annotations from the DCC's Annotations Manager
Methods & Data
Redactions and Other Annotations

Annotation data was taken from theTCGA Data Portalusing the query string:

https://tcga-data.nci.nih.gov/annotations/resources/searchannotations/json?item=TCGA

Redaction information was generated by filtering for the annotationClassificationName "Redaction"

FFPE information was generated by filtering for "FFPE" in annotation note text

Additional FFPEs were garnered from clinical data

Remaining annotations were sorted into sections by annotationClassificationName

Preprocessors
mRNA Preprocessor

The mRNA preprocess median module chooses the matrix for the platform(Affymetrix HG U133, Affymetrix Exon Array and Agilent Gene Expression) with the largest number of samples.

mRNAseq Preprocessor

The mRNAseq preprocessor picks the "scaled_estimate" (RSEM) value from Illumina HiSeq/GA2 mRNAseq level_3 (v2) data set and makes the mRNAseq matrix with log2 transformed for the downstream analysis. If there are overlap samples between two different platforms, samples from illumina hiseq will be selected. The pipeline also creates the matrix with RPKM and log2 transform from HiSeq/GA2 mRNAseq level 3 (v1) data set.

miRseq Preprocessor

The miRseq preprocessor picks the "RPM" (reads per million miRNA precursor reads) from the Illumina HiSeq/GA miRseq Level_3 data set and makes the matrix with log2 transformed values.

Methylation Preprocessor

The methylation preprocessor filters methylation data for use in downstream pipelines. To learn more about this preprocessor, please visit the documentation.