Mutation Analysis (MutSigCV v0.9)

Colon Adenocarcinoma (Primary solid tumor)

28 January 2016 | analyses__2016_01_28

Maintainer Information

Citation Information

Maintained by David Heiman (Broad Institute)

Cite as Broad Institute TCGA Genome Data Analysis Center (2016): Mutation Analysis (MutSigCV v0.9). Broad Institute of MIT and Harvard. doi:10.7908/C18P5ZTP

Overview

Introduction

This report serves to describe the mutational landscape and properties of a given individual set, as well as rank genes and genesets according to mutational significance. MutSigCV v0.9 was used to generate the results found in this report.

Working with individual set: COAD-TP
Number of patients in set: 367

Input

The input for this pipeline is a set of individuals with the following files associated for each:

An annotated .maf file describing the mutations called for the respective individual, and their properties.
A .wig file that contains information about the coverage of the sample.

Summary

MAF used for this analysis:COAD-TP.final_analysis_set.maf
Blacklist used for this analysis: pancan_mutation_blacklist.v14.hg19.txt
Significantly mutated genes (q ≤ 0.1): 93

Results

Target Coverage for Each Individual

The x axis represents the samples. The y axis represents the exons, one row per exon, and they are sorted by average coverage across samples. For exons with exactly the same average coverage, they are sorted next by the %GC of the exon. (The secondary sort is especially useful for the zero-coverage exons at the bottom). If the figure is unpopulated, then full coverage is assumed (e.g. MutSig CV doesn't use WIGs and assumes full coverage).

Figure 1.

Distribution of Mutation Counts, Coverage, and Mutation Rates Across Samples

Figure 2. Patients counts and rates file used to generate this plot: COAD-TP.patients.counts_and_rates.txt

Lego Plots

The mutation spectrum is depicted in the lego plots below in which the 96 possible mutation types are subdivided into six large blocks, color-coded to reflect the base substitution type. Each large block is further subdivided into the 16 possible pairs of 5' and 3' neighbors, as listed in the 4x4 trinucleotide context legend. The height of each block corresponds to the mutation frequency for that kind of mutation (counts of mutations normalized by the base coverage in a given bin). The shape of the spectrum is a signature for dominant mutational mechanisms in different tumor types.

Figure 3. Get High-res Image SNV Mutation rate lego plot for entire set. Each bin is normalized by base coverage for that bin. Colors represent the six SNV types on the upper right. The three-base context for each mutation is labeled in the 4x4 legend on the lower right. The fractional breakdown of SNV counts is shown in the pie chart on the upper left. If this figure is blank, not enough information was provided in the MAF to generate it.

Figure 4. Get High-res Image SNV Mutation rate lego plots for 4 slices of mutation allele fraction (0<=AF<0.1, 0.1<=AF<0.25, 0.25<=AF<0.5, & 0.5<=AF) . The color code and three-base context legends are the same as the previous figure. If this figure is blank, not enough information was provided in the MAF to generate it.

CoMut Plot

Figure 5. Get High-res Image The matrix in the center of the figure represents individual mutations in patient samples, color-coded by type of mutation, for the significantly mutated genes. The rate of synonymous and non-synonymous mutations is displayed at the top of the matrix. The barplot on the left of the matrix shows the number of mutations in each gene. The percentages represent the fraction of tumors with at least one mutation in the specified gene. The barplot to the right of the matrix displays the q-values for the most significantly mutated genes. The purple boxplots below the matrix (only displayed if required columns are present in the provided MAF) represent the distributions of allelic fractions observed in each sample. The plot at the bottom represents the base substitution distribution of individual samples, using the same categories that were used to calculate significance.

Significantly Mutated Genes

Column Descriptions:

nnon = number of (nonsilent) mutations in this gene across the individual set
npat = number of patients (individuals) with at least one nonsilent mutation
nsite = number of unique sites having a non-silent mutation
nflank = number of noncoding mutations from this gene's flanking region, across the individual set
nsil = number of silent mutations in this gene across the individual set
p = p-value (overall)
q = q-value, False Discovery Rate (Benjamini-Hochberg procedure)

Table 1. Get Full Table A Ranked List of Significantly Mutated Genes. Number of significant genes found: 93. Number of genes displayed: 35. Click on a gene name to display its stick figure depicting the distribution of mutations and mutation types across the chosen gene (this feature may not be available for all significant genes).

gene	Nnon	Nsil	Nflank	nnon	npat	nsite	nsil	nflank	nnei	fMLE	p	score	time	q
MLH1	874194	246991	3456	63	53	45	14	0	20	1.1	0	180	0.51	0
PTEN	357458	85878	1566	294	154	184	117	0	20	3.7	0	420	0.64	0
RNF43	611055	193409	1575	55	46	27	1	0	8	0.71	0	200	0.59	0
TP53	346815	101292	1854	338	232	189	63	0	4	8.5	0	640	0.8	0
RB1	1045216	275617	4374	132	88	86	39	0	20	1.4	5.6e-16	250	0.76	2e-12
NRAS	169187	44774	738	34	33	15	11	0	20	1.3	1.1e-15	100	0.39	3.4e-12
APC	2448624	686290	2619	667	296	373	167	0	4	8.8	1.4e-15	970	0.34	3.8e-12
CDH1	717118	213227	3114	88	74	65	17	0	20	1.4	1.9e-15	220	0.41	3.8e-12
TXNDC2	474898	132487	351	25	22	9	2	0	20	0.85	1.9e-15	110	0.48	3.8e-12
ARID1A	1639389	482972	3384	83	72	63	17	0	2	1.2	3e-15	230	0.32	4.2e-12
PIK3CA	953833	244055	3564	288	175	144	114	0	20	3.1	3.4e-15	360	0.64	4.2e-12
MUC4	1670584	541692	3618	133	94	61	21	0	2	1.8	3.7e-15	370	0.83	4.2e-12
CTNNB1	674913	201850	2538	100	74	65	40	0	20	1.7	3.8e-15	210	0.35	4.2e-12
KIT	865386	234880	3771	140	96	108	49	0	20	2.1	3.8e-15	200	0.35	4.2e-12
KRAS	222035	54316	954	182	168	37	14	4	1	6.4	3.8e-15	360	1	4.2e-12
NF2	470127	118541	2520	76	62	59	24	0	20	1	3.9e-15	220	1.2	4.2e-12
VHL	110834	34865	396	85	63	59	38	0	13	2.5	3.9e-15	190	0.61	4.2e-12
FBXW7	712714	195611	2151	118	102	71	42	2	20	2.6	4.3e-15	290	0.35	4.2e-12
B2M	105329	29360	576	21	16	16	1	0	20	0.65	4.4e-15	76	0.38	4.2e-12
SMAD4	485908	135423	1998	166	119	107	54	0	20	2.2	4.6e-15	320	0.52	4.2e-12
BRAF	633809	180931	3051	111	96	50	38	0	19	1.7	6.7e-15	230	0.35	5.8e-12
SOX9	345714	101659	468	40	36	37	0	0	20	1.2	7.7e-15	180	0.75	6.4e-12
STK11	182766	52848	423	36	32	24	12	0	20	1.2	1e-14	110	0.67	8e-12
EGFR	1130360	309014	5292	105	75	90	42	0	20	1.3	1.9e-13	170	0.68	1.4e-10
PIK3R1	684088	175793	3105	52	43	35	12	1	20	1.4	4.1e-13	130	0.63	3e-10
RUNX1	282223	88447	1755	41	37	29	15	0	13	1.3	5.6e-12	100	0.88	4e-09
BMPR2	898783	258001	2340	40	35	37	4	0	20	0.88	1.2e-11	120	0.73	8.4e-09
CD58	198180	51013	810	15	14	13	0	0	20	0.57	7.5e-11	62	0.36	4.9e-08
SELPLG	315987	113770	279	17	16	9	1	0	20	0.82	4.2e-10	77	0.76	2.7e-07
ZFP36L2	185702	59821	144	14	14	11	1	0	20	1.2	9.1e-10	76	0.43	5.6e-07
CRIPAK	361128	120009	207	18	18	10	2	0	20	0.83	1.1e-09	80	0.92	6.6e-07
GNG12	64592	17249	396	8	8	3	0	0	20	0.79	2.2e-09	44	0.86	1.3e-06
FGFR2	767030	214695	4617	50	45	44	17	0	18	1.4	8.2e-09	120	1.4	4.6e-06
CEBPA	64959	19818	36	15	15	14	5	0	20	1	1.1e-08	48	0.73	5.8e-06
COL6A5	495450	130285	756	46	33	42	8	0	7	1.4	3.7e-08	100	0.34	0.000019

MLH1

Figure S1. This figure depicts the distribution of mutations and mutation types across the MLH1 significant gene.

PTEN

Figure S2. This figure depicts the distribution of mutations and mutation types across the PTEN significant gene.

RNF43

Figure S3. This figure depicts the distribution of mutations and mutation types across the RNF43 significant gene.

TP53

Figure S4. This figure depicts the distribution of mutations and mutation types across the TP53 significant gene.

RB1

Figure S5. This figure depicts the distribution of mutations and mutation types across the RB1 significant gene.

NRAS

Figure S6. This figure depicts the distribution of mutations and mutation types across the NRAS significant gene.

APC

Figure S7. This figure depicts the distribution of mutations and mutation types across the APC significant gene.

CDH1

Figure S8. This figure depicts the distribution of mutations and mutation types across the CDH1 significant gene.

TXNDC2

Figure S9. This figure depicts the distribution of mutations and mutation types across the TXNDC2 significant gene.

ARID1A

Figure S10. This figure depicts the distribution of mutations and mutation types across the ARID1A significant gene.

PIK3CA

Figure S11. This figure depicts the distribution of mutations and mutation types across the PIK3CA significant gene.

MUC4

Figure S12. This figure depicts the distribution of mutations and mutation types across the MUC4 significant gene.

CTNNB1

Figure S13. This figure depicts the distribution of mutations and mutation types across the CTNNB1 significant gene.

KIT

Figure S14. This figure depicts the distribution of mutations and mutation types across the KIT significant gene.

KRAS

Figure S15. This figure depicts the distribution of mutations and mutation types across the KRAS significant gene.

NF2

Figure S16. This figure depicts the distribution of mutations and mutation types across the NF2 significant gene.

VHL

Figure S17. This figure depicts the distribution of mutations and mutation types across the VHL significant gene.

FBXW7

Figure S18. This figure depicts the distribution of mutations and mutation types across the FBXW7 significant gene.

B2M

Figure S19. This figure depicts the distribution of mutations and mutation types across the B2M significant gene.

SMAD4

Figure S20. This figure depicts the distribution of mutations and mutation types across the SMAD4 significant gene.

BRAF

Figure S21. This figure depicts the distribution of mutations and mutation types across the BRAF significant gene.

SOX9

Figure S22. This figure depicts the distribution of mutations and mutation types across the SOX9 significant gene.

STK11

Figure S23. This figure depicts the distribution of mutations and mutation types across the STK11 significant gene.

EGFR

Figure S24. This figure depicts the distribution of mutations and mutation types across the EGFR significant gene.

PIK3R1

Figure S25. This figure depicts the distribution of mutations and mutation types across the PIK3R1 significant gene.

RUNX1

Figure S26. This figure depicts the distribution of mutations and mutation types across the RUNX1 significant gene.

BMPR2

Figure S27. This figure depicts the distribution of mutations and mutation types across the BMPR2 significant gene.

CD58

Figure S28. This figure depicts the distribution of mutations and mutation types across the CD58 significant gene.

SELPLG

Figure S29. This figure depicts the distribution of mutations and mutation types across the SELPLG significant gene.

ZFP36L2

Figure S30. This figure depicts the distribution of mutations and mutation types across the ZFP36L2 significant gene.

CRIPAK

Figure S31. This figure depicts the distribution of mutations and mutation types across the CRIPAK significant gene.

GNG12

Figure S32. This figure depicts the distribution of mutations and mutation types across the GNG12 significant gene.

FGFR2

Figure S33. This figure depicts the distribution of mutations and mutation types across the FGFR2 significant gene.

CEBPA

Figure S34. This figure depicts the distribution of mutations and mutation types across the CEBPA significant gene.

Methods & Data

Methods

In brief, we tabulate the number of mutations and the number of covered bases for each gene. The counts are broken down by mutation context category: four context categories that are discovered by MutSig, and one for indel and 'null' mutations, which include indels, nonsense mutations, splice-site mutations, and non-stop (read-through) mutations. For each gene, we calculate the probability of seeing the observed constellation of mutations, i.e. the product P1 x P2 x ... x Pm, or a more extreme one, given the background mutation rates calculated across the dataset. [1]

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References

[1] TCGA, Integrated genomic analyses of ovarian carcinoma, Nature 474:609 - 615 (2011)

Made with Nozzle