stddata__2016_01_28 Samples Report
Overview
Introduction

The Broad GDAC mirrors data from the DCC on a daily basis. Although all data is mirrored, not every sample is ingested into Firehose. There are three main mechanisms that filter samples to ensure that only the most scientifically relevant samples make it into our standard data and analyses runs. These three mechanisms are redactions, replicate filtering, and blacklisting. This report summarizes the data that is ingested into Firehose, describes the three filtering mechanisms, lists those samples that are removed, and gives all available annotations from the DCC's Annotation Manager.

Summary

There were 244 redactions, 3362 replicate aliquots, 23 blacklisted aliquots, and 481 FFPE aliquots. The table below represents the sample counts for those samples that were ingested into firehose after filtering out redactions, replicates, and blacklisted data, and segregating FFPEs.

Table 1.  Get Full Table Summary of TCGA Tumor Data. Click on a tumor type to display a tumor type specific Samples Report.

Cohort BCR Clinical CN LowP Methylation mRNA mRNASeq miR miRSeq RPPA MAF rawMAF
ACC 92 92 90 0 80 0 79 0 80 46 90 0
BLCA 412 412 410 112 412 0 408 0 409 344 130 395
BRCA 1098 1097 1089 19 1097 526 1093 0 1078 887 977 0
CESC 307 307 295 50 307 0 304 0 307 173 194 0
CHOL 51 45 36 0 36 0 36 0 36 30 35 0
COAD 460 458 451 69 457 153 457 0 406 360 154 367
COADREAD 631 629 616 104 622 222 623 0 549 491 223 489
DLBC 58 48 48 0 48 0 48 0 47 33 48 0
ESCA 185 185 184 51 185 0 184 0 184 126 185 0
FPPP 38 38 0 0 0 0 0 0 23 0 0 0
GBM 613 595 577 0 420 540 160 565 0 238 290 290
GBMLGG 1129 1110 1090 52 936 567 676 565 512 668 576 806
HNSC 528 528 522 108 528 0 520 0 523 212 279 510
KICH 113 113 66 0 66 0 66 0 66 63 66 66
KIPAN 973 941 883 0 892 88 889 0 873 756 644 799
KIRC 537 537 528 0 535 72 533 0 516 478 417 451
KIRP 323 291 289 0 291 16 290 0 291 215 161 282
LAML 200 200 197 0 194 0 179 0 188 0 197 0
LGG 516 515 513 52 516 27 516 0 512 430 286 516
LIHC 377 377 370 0 377 0 371 0 372 63 198 373
LUAD 585 522 516 120 578 32 515 0 513 365 230 542
LUSC 504 504 501 0 503 154 501 0 478 328 178 0
MESO 87 87 87 0 87 0 87 0 87 63 0 0
OV 602 591 586 0 594 574 304 570 453 426 316 469
PAAD 185 185 184 0 184 0 178 0 178 123 150 184
PCPG 179 179 175 0 179 0 179 0 179 80 179 0
PRAD 499 499 492 115 498 0 497 0 494 352 332 498
READ 171 171 165 35 165 69 166 0 143 131 69 122
SARC 261 261 257 0 261 0 259 0 259 223 247 0
SKCM 470 470 469 118 470 0 469 0 448 353 343 366
STAD 443 443 442 107 443 0 415 0 436 357 289 395
STES 628 628 626 158 628 0 599 0 620 483 474 395
TGCT 150 134 150 0 150 0 150 0 150 118 149 0
THCA 503 503 499 98 503 0 501 0 502 222 402 496
THYM 124 124 123 0 124 0 120 0 124 90 123 0
UCEC 560 548 540 106 547 54 545 0 538 440 248 0
UCS 57 57 56 0 57 0 57 0 56 48 57 0
UVM 80 80 80 51 80 0 80 0 80 12 80 0
Totals 11368 11196 10987 1211 10972 2217 10267 1135 10156 7429 7099 6322
Results
Sample Heatmaps
ACC

Figure 1.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

BLCA

Figure 2.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

BRCA

Figure 3.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

CESC

Figure 4.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

CHOL

Figure 5.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

COAD

Figure 6.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

COADREAD

Figure 7.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

DLBC

Figure 8.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

ESCA

Figure 9.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

FPPP

Figure 10.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

GBM

Figure 11.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

GBMLGG

Figure 12.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

HNSC

Figure 13.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

KICH

Figure 14.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

KIPAN

Figure 15.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

KIRC

Figure 16.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

KIRP

Figure 17.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LAML

Figure 18.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LGG

Figure 19.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LIHC

Figure 20.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LUAD

Figure 21.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LUSC

Figure 22.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

MESO

Figure 23.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

OV

Figure 24.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

PAAD

Figure 25.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

PCPG

Figure 26.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

PRAD

Figure 27.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

READ

Figure 28.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

SARC

Figure 29.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

SKCM

Figure 30.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

STAD

Figure 31.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

STES

Figure 32.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TGCT

Figure 33.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

THCA

Figure 34.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

THYM

Figure 35.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

UCEC

Figure 36.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

UCS

Figure 37.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

UVM

Figure 38.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

FFPE Cases
Additional Annotations from the DCC's Annotations Manager
Methods & Data
Redactions and Other Annotations

Annotation data was taken from theTCGA Data Portalusing the query string:

https://tcga-data.nci.nih.gov/annotations/resources/searchannotations/json?item=TCGA

Redaction information was generated by filtering for the annotationClassificationName "Redaction"

FFPE information was generated by filtering for "FFPE" in annotation note text

Additional FFPEs were garnered from clinical data

Remaining annotations were sorted into sections by annotationClassificationName

Preprocessors
mRNA Preprocessor

The mRNA preprocess median module chooses the matrix for the platform(Affymetrix HG U133, Affymetrix Exon Array and Agilent Gene Expression) with the largest number of samples.

mRNAseq Preprocessor

The mRNAseq preprocessor picks the "scaled_estimate" (RSEM) value from Illumina HiSeq/GA2 mRNAseq level_3 (v2) data set and makes the mRNAseq matrix with log2 transformed for the downstream analysis. If there are overlap samples between two different platforms, samples from illumina hiseq will be selected. The pipeline also creates the matrix with RPKM and log2 transform from HiSeq/GA2 mRNAseq level 3 (v1) data set.

miRseq Preprocessor

The miRseq preprocessor picks the "RPM" (reads per million miRNA precursor reads) from the Illumina HiSeq/GA miRseq Level_3 data set and makes the matrix with log2 transformed values.

Methylation Preprocessor

The methylation preprocessor filters methylation data for use in downstream pipelines. To learn more about this preprocessor, please visit the documentation.