stddata__2014_03_16 Samples Report
Overview
Introduction

The Broad GDAC mirrors data from the DCC on a daily basis. Although all data is mirrored, not every sample is ingested into Firehose. There are three main mechanisms that filter samples to ensure that only the most scientifically relevant samples make it into our standard data and analyses runs. These three mechanisms are redactions, replicate filtering, and blacklisting. This report summarizes the data that is ingested into Firehose, describes the three filtering mechanisms, lists those samples that are removed, and gives all available annotations from the DCC's Annotation Manager.

Summary

There were 160 redactions, 2852 replicate aliquots, 23 blacklisted aliquots, and 577 FFPE aliquots. The table below represents the sample counts for those samples that were ingested into firehose after filtering out redactions, replicates, and blacklisted data, and segregating FFPEs.

Table 1.  Get Full Table Summary of TCGA Tumor Data. Click on a tumor type to display a tumor type specific Samples Report.

Cohort BCR Clinical CN LowP Methylation mRNA mRNASeq miR miRSeq RPPA MAF
ACC 92 15 90 0 80 0 79 0 80 0 90
BLCA 268 198 252 112 242 0 241 0 241 127 130
BRCA 1061 981 1041 19 1024 526 1037 0 1021 408 976
CESC 242 127 192 0 189 0 185 0 200 0 39
COAD 448 436 427 69 434 153 432 0 406 331 154
COADREAD 616 604 589 104 596 222 595 0 549 461 223
DLBC 38 21 28 0 28 0 28 0 27 0 0
ESCA 165 39 97 32 93 0 0 0 72 0 0
GBM 607 578 570 0 414 540 160 565 0 214 290
HNSC 517 408 509 108 457 0 497 0 512 212 306
KICH 113 93 66 0 66 0 66 0 66 0 66
KIRC 536 507 514 0 511 72 518 0 502 454 417
KIRP 259 164 182 0 198 16 172 0 198 0 168
LAML 200 200 197 0 194 0 179 0 188 0 197
LGG 516 305 463 52 403 27 463 0 438 258 289
LIHC 240 151 190 0 194 0 191 0 200 0 0
LUAD 563 466 493 120 555 32 488 0 491 237 229
LUSC 493 411 490 0 492 154 489 0 467 195 178
MESO 37 13 37 0 37 0 0 0 0 0 0
OV 592 580 576 0 584 574 296 570 453 412 316
PAAD 113 73 91 0 91 0 85 0 85 0 91
PANCAN12 6009 5415 5756 569 5601 2174 4987 1135 4937 2920 3510
PCPG 179 10 0 0 179 0 0 0 0 0 0
PRAD 427 199 331 115 336 0 297 0 326 164 261
READ 168 168 162 35 162 69 163 0 143 130 69
SARC 199 102 137 0 170 0 103 0 136 0 0
SKCM 431 341 385 119 374 0 372 0 354 205 344
STAD 373 311 352 107 373 0 274 0 323 264 221
THCA 496 484 494 98 496 0 494 0 495 222 402
UCEC 556 482 525 106 532 54 527 0 513 200 248
UCS 57 57 56 0 57 0 57 0 56 0 57
Totals 9986 7920 8947 1092 8965 2217 7893 1135 7993 4033 5538
Results
Sample Heatmaps
ACC

Figure 1.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

BLCA

Figure 2.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

BRCA

Figure 3.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

CESC

Figure 4.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

COAD

Figure 5.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

COADREAD

Figure 6.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

DLBC

Figure 7.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

ESCA

Figure 8.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

GBM

Figure 9.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

HNSC

Figure 10.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

KICH

Figure 11.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

KIRC

Figure 12.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

KIRP

Figure 13.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LAML

Figure 14.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LGG

Figure 15.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LIHC

Figure 16.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LUAD

Figure 17.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

LUSC

Figure 18.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

MESO

Figure 19.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

OV

Figure 20.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

PAAD

Figure 21.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

PANCAN12

Figure 22.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

PCPG

Figure 23.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

PRAD

Figure 24.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

READ

Figure 25.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

SARC

Figure 26.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

SKCM

Figure 27.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

STAD

Figure 28.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

THCA

Figure 29.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

UCEC

Figure 30.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

UCS

Figure 31.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

FFPE Cases
Additional Annotations from the DCC's Annotations Manager
Methods & Data
Redactions and Other Annotations

Annotation data was taken from theTCGA Data Portalusing the query string:

https://tcga-data.nci.nih.gov/annotations/resources/searchannotations/json?item=TCGA

Redaction information was generated by filtering for the annotationClassificationName "Redaction"

FFPE information was generated by filtering for "FFPE" in annotation note text

Additional FFPEs were garnered from clinical data

Remaining annotations were sorted into sections by annotationClassificationName

Preprocessors
mRNA Preprocessor

The mRNA preprocess median module chooses the matrix for the platform(Affymetrix HG U133, Affymetrix Exon Array and Agilent Gene Expression) with the largest number of samples.

mRNAseq Preprocessor

The mRNAseq preprocessor picks the "scaled_estimate" (RSEM) value from Illumina HiSeq/GA2 mRNAseq level_3 (v2) data set and makes the mRNAseq matrix with log2 transformed for the downstream analysis. If there are overlap samples between two different platforms, samples from illumina hiseq will be selected. The pipeline also creates the matrix with RPKM and log2 transform from HiSeq/GA2 mRNAseq level 3 (v1) data set.

miRseq Preprocessor

The miRseq preprocessor picks the "RPM" (reads per million miRNA precursor reads) from the Illumina HiSeq/GA miRseq Level_3 data set and makes the matrix with log2 transformed values.

Methylation Preprocessor

The methylation preprocessor filters methylation data for use in downstream pipelines. To learn more about this preprocessor, please visit the documentation.