stddata__2017_09_30 Samples Report
Overview
Introduction

This is a summary of data mirrored from the Genomic Data Commons (GDC) and processed by the GDCtools package. Note that some sample data will be filtered as unsuitable for downstream pipelines, through one of three mechanisms: redactions, replicate filtering, and blacklisting. The report lists the counts and types of the sample data, in both hyperlinked tables and heatmap images; describes the three filtering mechanisms; lists the samples removed by filtering, why they were removed; and (eventually will) catalog how the data have been annotated by the respective projects that submitted them to the GDC.

Summary

There were 0 redactions, 1138 replicate aliquots, 0 blacklisted aliquots, and 0 FFPE aliquots. The table below represents the sample counts for those samples that were ingested into firehose after filtering out redactions, replicates, and blacklisted data, and segregating FFPEs.

Table 1.  Get Full Table Summary of TCGA Tumor Data. Click on a tumor type to display a tumor type specific Samples Report.

Cohort BCR Clinical CN mRNA miR MAF Methylation
TCGA-ACC 92 92 90 79 80 92 80
TCGA-BLCA 412 412 412 408 409 412 412
TCGA-BRCA 1097 1096 1094 1085 1078 1044 1094
TCGA-CESC 307 307 295 304 307 305 307
TCGA-CHOL 51 45 36 36 36 51 36
TCGA-COAD 461 459 450 456 444 432 458
TCGA-COADREAD 633 629 614 622 605 589 623
TCGA-DLBC 58 48 48 48 47 48 48
TCGA-ESCA 185 185 184 161 184 184 185
TCGA-GBM 616 595 590 154 0 396 422
TCGA-GBMLGG 1131 1109 1104 665 512 909 937
TCGA-HNSC 527 527 517 500 523 510 527
TCGA-KICH 113 113 66 65 66 66 66
TCGA-KIPAN 940 940 886 883 873 693 889
TCGA-KIRC 536 536 530 530 516 339 532
TCGA-KIRP 291 291 290 288 291 288 291
TCGA-LAML 200 200 143 151 103 149 140
TCGA-LGG 515 514 514 511 512 513 515
TCGA-LIHC 377 377 375 371 372 375 377
TCGA-LUAD 584 521 518 513 513 569 577
TCGA-LUSC 503 503 503 501 478 497 502
TCGA-MESO 87 87 87 86 87 83 87
TCGA-OV 607 586 568 374 489 441 591
TCGA-PAAD 185 185 184 177 178 183 184
TCGA-PANGI 1261 1257 1240 1158 1225 1214 1251
TCGA-PCPG 179 179 178 178 179 179 179
TCGA-PRAD 500 500 495 495 494 498 497
TCGA-READ 172 170 164 166 161 157 165
TCGA-SARC 261 261 260 259 259 255 261
TCGA-SKCM 470 470 368 367 352 368 368
TCGA-STAD 443 443 442 375 436 441 443
TCGA-STES 628 628 626 536 620 625 628
TCGA-TGCT 150 134 134 150 150 150 150
TCGA-THCA 506 506 505 502 506 496 506
TCGA-THYM 124 124 124 119 124 123 124
TCGA-UCEC 559 547 540 543 538 542 546
TCGA-UCS 57 57 56 56 57 57 57
TCGA-UVM 80 80 80 80 80 80 80
Totals 11305 11150 10840 10088 10049 10323 10807
Results
Sample Heatmaps
TCGA-ACC

Figure 1.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-BLCA

Figure 2.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-BRCA

Figure 3.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-CESC

Figure 4.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-CHOL

Figure 5.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-COAD

Figure 6.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-COADREAD

Figure 7.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-DLBC

Figure 8.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-ESCA

Figure 9.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-GBM

Figure 10.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-GBMLGG

Figure 11.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-HNSC

Figure 12.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KICH

Figure 13.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIPAN

Figure 14.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIRC

Figure 15.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIRP

Figure 16.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LAML

Figure 17.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LGG

Figure 18.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LIHC

Figure 19.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LUAD

Figure 20.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LUSC

Figure 21.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-MESO

Figure 22.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-OV

Figure 23.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PAAD

Figure 24.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PANGI

Figure 25.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PCPG

Figure 26.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PRAD

Figure 27.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-READ

Figure 28.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-SARC

Figure 29.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-SKCM

Figure 30.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-STAD

Figure 31.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-STES

Figure 32.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-TGCT

Figure 33.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-THCA

Figure 34.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-THYM

Figure 35.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UCEC

Figure 36.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UCS

Figure 37.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UVM

Figure 38.  Get High-res Image This figure depicts the distribution of available data on a per participant basis.

Methods & Data
Redactions and Other Annotations

NOT IMPLEMENTED YET: redactions are not yet exposed at the GDC. For examples of the annotation-based filtering performed in the past by the Broad GDAC Firehose pipeline, explore this legacy GDAC Firehose sample report

Preprocessors
mRNA Preprocessor

The mRNA preprocess median module chooses the matrix for the platform(Affymetrix HG U133, Affymetrix Exon Array and Agilent Gene Expression) with the largest number of samples.

mRNAseq Preprocessor

The mRNAseq preprocessor picks the "scaled_estimate" (RSEM) value from Illumina HiSeq/GA2 mRNAseq level_3 (v2) data set and makes the mRNAseq matrix with log2 transformed for the downstream analysis. If there are overlap samples between two different platforms, samples from illumina hiseq will be selected. The pipeline also creates the matrix with RPKM and log2 transform from HiSeq/GA2 mRNAseq level 3 (v1) data set.

miRseq Preprocessor

The miRseq preprocessor picks the "RPM" (reads per million miRNA precursor reads) from the Illumina HiSeq/GA miRseq Level_3 data set and makes the matrix with log2 transformed values.

Methylation Preprocessor

The methylation preprocessor filters methylation data for use in downstream pipelines. To learn more about this preprocessor, please visit the documentation.