stddata__2016_11_03 Samples Report

Overview

Introduction

The Broad GDAC mirrors data from the DCC on a daily basis. Although all data is mirrored, not every sample is ingested into Firehose. There are three main mechanisms that filter samples to ensure that only the most scientifically relevant samples make it into our standard data and analyses runs. These three mechanisms are redactions, replicate filtering, and blacklisting. This report summarizes the data that is ingested into Firehose, describes the three filtering mechanisms, lists those samples that are removed, and gives all available annotations from the DCC's Annotation Manager.

Summary

There were 0 redactions, 1139 replicate aliquots, 0 blacklisted aliquots, and 0 FFPE aliquots. The table below represents the sample counts for those samples that were ingested into firehose after filtering out redactions, replicates, and blacklisted data, and segregating FFPEs.

Table 1. Get Full Table Summary of TCGA Tumor Data. Click on a tumor type to display a tumor type specific Samples Report.

Cohort	BCR	CN	Clinical	MAF	Methylation	mRNA	miR
ACC	92	90	92	92	80	79	80
BLCA	412	412	412	412	412	408	409
BRCA	1098	1094	1097	1044	1095	1085	1078
CESC	308	295	307	305	307	304	307
CHOL	51	36	45	51	36	36	36
COAD	463	450	459	432	459	456	444
COADREAD	635	614	629	589	624	622	605
DLBC	58	48	48	48	48	48	47
ESCA	185	184	185	184	185	161	184
GBM	617	590	596	396	422	154	0
GBMLGG	1133	1104	1111	909	938	665	512
HNSC	528	517	528	510	528	500	523
KICH	113	66	113	66	66	65	66
KIPAN	941	886	941	693	890	883	873
KIRC	537	530	537	339	533	530	516
KIRP	291	290	291	288	291	288	291
LAML	200	143	200	149	140	151	103
LGG	516	514	515	513	516	511	512
LIHC	377	375	377	375	377	371	372
LUAD	585	518	522	569	578	513	513
LUSC	504	503	504	497	503	501	478
MESO	87	87	87	83	87	86	87
OV	608	568	587	441	592	374	489
PAAD	185	184	185	183	184	177	178
PANGI	1298	1240	1257	1214	1287	1158	1225
PCPG	179	178	179	179	179	178	179
PRAD	500	495	500	498	498	495	494
READ	172	164	170	157	165	166	161
SARC	261	260	261	255	261	259	259
SKCM	470	368	470	368	368	367	352
STAD	478	442	443	441	478	375	436
STES	663	626	628	625	663	536	620
TGCT	150	134	134	150	150	150	150
THCA	507	505	507	496	507	502	506
THYM	124	124	124	123	124	119	124
UCEC	560	540	548	542	547	543	538
UCS	57	56	57	57	57	56	57
UVM	80	80	80	80	80	80	80
Totals	11353	10840	11160	10323	10853	10088	10049

Results

Sample Heatmaps

TCGA-ACC

Figure 1. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-BLCA

Figure 2. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-BRCA

Figure 3. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-CESC

Figure 4. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-CHOL

Figure 5. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-COAD

Figure 6. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-COADREAD

Figure 7. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-DLBC

Figure 8. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-ESCA

Figure 9. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-GBM

Figure 10. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-GBMLGG

Figure 11. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-HNSC

Figure 12. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KICH

Figure 13. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIPAN

Figure 14. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIRC

Figure 15. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-KIRP

Figure 16. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LAML

Figure 17. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LGG

Figure 18. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LIHC

Figure 19. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LUAD

Figure 20. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-LUSC

Figure 21. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-MESO

Figure 22. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-OV

Figure 23. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PAAD

Figure 24. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PANGI

Figure 25. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PCPG

Figure 26. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-PRAD

Figure 27. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-READ

Figure 28. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-SARC

Figure 29. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-SKCM

Figure 30. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-STAD

Figure 31. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-STES

Figure 32. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-TGCT

Figure 33. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-THCA

Figure 34. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-THYM

Figure 35. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UCEC

Figure 36. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UCS

Figure 37. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

TCGA-UVM

Figure 38. Get High-res Image This figure depicts the distribution of available data on a per participant basis.

Filtered Samples

Redactions

Replicate Samples

Blacklisted Samples

FFPE Cases

[NOT YET IMPLEMENTED] Additional Annotations from the DCC's Annotations Manager

Methods & Data

Redactions and Other Annotations

Annotation data was taken from theTCGA Data Portalusing the query string:

https://tcga-data.nci.nih.gov/annotations/resources/searchannotations/json?item=TCGA

Redaction information was generated by filtering for the annotationClassificationName "Redaction"

FFPE information was generated by filtering for "FFPE" in annotation note text

Additional FFPEs were garnered from clinical data

Remaining annotations were sorted into sections by annotationClassificationName

Preprocessors

mRNA Preprocessor

The mRNA preprocess median module chooses the matrix for the platform(Affymetrix HG U133, Affymetrix Exon Array and Agilent Gene Expression) with the largest number of samples.

mRNAseq Preprocessor

The mRNAseq preprocessor picks the "scaled_estimate" (RSEM) value from Illumina HiSeq/GA2 mRNAseq level_3 (v2) data set and makes the mRNAseq matrix with log2 transformed for the downstream analysis. If there are overlap samples between two different platforms, samples from illumina hiseq will be selected. The pipeline also creates the matrix with RPKM and log2 transform from HiSeq/GA2 mRNAseq level 3 (v1) data set.

miRseq Preprocessor

The miRseq preprocessor picks the "RPM" (reads per million miRNA precursor reads) from the Illumina HiSeq/GA miRseq Level_3 data set and makes the matrix with log2 transformed values.

Methylation Preprocessor

The methylation preprocessor filters methylation data for use in downstream pipelines. To learn more about this preprocessor, please visit the documentation.

Made with Nozzle