Mutation Analysis (MutSigCV v0.9)
Pan-kidney cohort (KICH+KIRC+KIRP) (Primary solid tumor)
02 April 2015  |  analyses__2015_04_02
Maintainer Information
Citation Information
Maintained by David Heiman (Broad Institute)
Cite as Broad Institute TCGA Genome Data Analysis Center (2015): Mutation Analysis (MutSigCV v0.9). Broad Institute of MIT and Harvard. doi:10.7908/C1QZ2915
Overview
Introduction

This report serves to describe the mutational landscape and properties of a given individual set, as well as rank genes and genesets according to mutational significance. MutSigCV v0.9 was used to generate the results found in this report.

  • Working with individual set: KIPAN-TP

  • Number of patients in set: 775

Input

The input for this pipeline is a set of individuals with the following files associated for each:

  1. An annotated .maf file describing the mutations called for the respective individual, and their properties.

  2. A .wig file that contains information about the coverage of the sample.

Summary
  • MAF used for this analysis:KIPAN-TP.final_analysis_set.maf

  • Blacklist used for this analysis: pancan_mutation_blacklist.v14.hg19.txt

  • Significantly mutated genes (q ≤ 0.1): 78

Results
Target Coverage for Each Individual

The x axis represents the samples. The y axis represents the exons, one row per exon, and they are sorted by average coverage across samples. For exons with exactly the same average coverage, they are sorted next by the %GC of the exon. (The secondary sort is especially useful for the zero-coverage exons at the bottom). If the figure is unpopulated, then full coverage is assumed (e.g. MutSig CV doesn't use WIGs and assumes full coverage).

Figure 1. 

Distribution of Mutation Counts, Coverage, and Mutation Rates Across Samples

Figure 2.  Patients counts and rates file used to generate this plot: KIPAN-TP.patients.counts_and_rates.txt

Lego Plots

The mutation spectrum is depicted in the lego plots below in which the 96 possible mutation types are subdivided into six large blocks, color-coded to reflect the base substitution type. Each large block is further subdivided into the 16 possible pairs of 5' and 3' neighbors, as listed in the 4x4 trinucleotide context legend. The height of each block corresponds to the mutation frequency for that kind of mutation (counts of mutations normalized by the base coverage in a given bin). The shape of the spectrum is a signature for dominant mutational mechanisms in different tumor types.

Figure 3.  Get High-res Image SNV Mutation rate lego plot for entire set. Each bin is normalized by base coverage for that bin. Colors represent the six SNV types on the upper right. The three-base context for each mutation is labeled in the 4x4 legend on the lower right. The fractional breakdown of SNV counts is shown in the pie chart on the upper left. If this figure is blank, not enough information was provided in the MAF to generate it.

Figure 4.  Get High-res Image SNV Mutation rate lego plots for 4 slices of mutation allele fraction (0<=AF<0.1, 0.1<=AF<0.25, 0.25<=AF<0.5, & 0.5<=AF) . The color code and three-base context legends are the same as the previous figure. If this figure is blank, not enough information was provided in the MAF to generate it.

CoMut Plot

Figure 5.  Get High-res Image The matrix in the center of the figure represents individual mutations in patient samples, color-coded by type of mutation, for the significantly mutated genes. The rate of synonymous and non-synonymous mutations is displayed at the top of the matrix. The barplot on the left of the matrix shows the number of mutations in each gene. The percentages represent the fraction of tumors with at least one mutation in the specified gene. The barplot to the right of the matrix displays the q-values for the most significantly mutated genes. The purple boxplots below the matrix (only displayed if required columns are present in the provided MAF) represent the distributions of allelic fractions observed in each sample. The plot at the bottom represents the base substitution distribution of individual samples, using the same categories that were used to calculate significance.

Significantly Mutated Genes

Column Descriptions:

  • nnon = number of (nonsilent) mutations in this gene across the individual set

  • npat = number of patients (individuals) with at least one nonsilent mutation

  • nsite = number of unique sites having a non-silent mutation

  • nflank = number of noncoding mutations from this gene's flanking region, across the individual set

  • nsil = number of silent mutations in this gene across the individual set

  • p = p-value (overall)

  • q = q-value, False Discovery Rate (Benjamini-Hochberg procedure)

Table 1.  Get Full Table A Ranked List of Significantly Mutated Genes. Number of significant genes found: 78. Number of genes displayed: 35. Click on a gene name to display its stick figure depicting the distribution of mutations and mutation types across the chosen gene (this feature may not be available for all significant genes).

gene Nnon Nsil Nflank nnon npat nsite nsil nflank nnei fMLE p score time q
PTEN 754850 181350 5916 42 35 37 1 0 20 1.1 5.6e-16 180 0.65 1e-11
PBRM1 3099225 806000 20604 162 158 147 5 1 20 1.1 3.9e-15 800 0.6 2.6e-11
VHL 234050 73625 1496 251 242 142 7 0 13 1.8 4.7e-15 1300 0.72 2.6e-11
TP53 732375 213900 7004 55 44 43 2 0 4 0.85 5.7e-15 180 0.62 2.6e-11
SETD2 3823075 1023775 12546 76 69 72 4 3 7 1.4 7.1e-15 290 0.64 2.6e-11
C6orf25 485925 117800 4250 19 19 1 0 0 20 1.2 1.9e-14 110 0.63 5.7e-11
BAP1 1222950 367350 10030 54 51 49 1 0 20 0.9 2.2e-14 250 0.78 5.8e-11
GCNT2 1854575 514600 3638 22 22 6 1 0 20 0.73 2.6e-14 110 0.81 6e-11
EFNB3 501425 178250 2550 12 12 1 0 0 20 0.57 8e-14 73 0.61 1.6e-10
KDM5C 2444350 733150 14654 32 32 32 2 0 20 1.4 2.2e-12 140 0.61 3.9e-09
CCDC91 817625 204600 7718 15 15 3 0 0 20 0.92 5.8e-12 80 0.62 9.6e-09
NAPSA 699050 235600 5508 14 13 4 0 0 20 0.58 6.6e-12 74 0.7 1e-08
DNMT1 2821775 768025 24548 25 25 10 0 0 20 0.76 7.5e-11 110 0.68 1.1e-07
NF2 992775 250325 9520 16 16 15 1 0 20 1.5 5.3e-10 87 0.65 7e-07
PHYH 597525 152675 5440 10 10 2 0 0 20 0.72 2.3e-09 57 0.57 2.8e-06
CIB3 345650 88350 3944 8 8 2 0 0 20 0.23 2.6e-09 49 0.66 2.9e-06
SDHAF2 355725 93000 3162 9 9 1 0 0 20 1 8.2e-09 54 0.6 8.8e-06
MARK4 991225 297600 10132 9 9 2 0 0 20 0.093 3.3e-08 54 0.65 0.000034
NEFH 1292700 357275 2040 21 17 11 2 0 20 0.97 3.8e-08 74 0.67 0.000037
SPRY4 565750 174375 1394 11 11 3 0 0 20 0.87 4.1e-08 56 0.6 0.000037
RRAS2 342550 86800 3570 8 8 1 0 0 20 1 4.6e-08 48 0.66 4e-05
CCDC136 1498850 362700 5406 15 15 2 1 0 20 1.2 7.8e-08 78 0.6 0.000065
SORD 492900 148025 4318 11 11 3 0 0 20 1.4 1.1e-07 55 0.62 9e-05
ARPC2 561875 147250 6528 9 9 3 0 0 20 0.63 1.4e-07 49 0.68 0.00011
TAS2R3 565750 170500 816 14 13 5 0 0 20 0.38 1.8e-07 43 0.59 0.00013
PCGF1 416950 108500 4896 9 8 3 1 0 20 1.1 3e-07 48 0.59 0.00021
ZMAT2 390600 87575 4148 7 7 1 1 0 20 0.82 3.9e-07 43 0.62 0.00026
HNRNPM 1288825 364250 10200 13 13 5 0 0 20 0.61 4e-07 67 0.6 0.00026
PRSS3 454150 134075 3536 12 11 10 7 0 20 1.1 4.4e-07 49 0.61 0.00028
DPEP2 770350 236375 6460 9 9 3 0 0 20 0.62 9.7e-07 47 0.68 0.00059
LETMD1 665725 196075 6256 10 10 5 1 0 20 0.72 1.1e-06 48 0.59 0.00063
PRB1 357275 123225 2074 11 9 6 4 0 12 1.9 1.1e-06 47 0.63 0.00063
DNMT3A 1605800 445625 14756 16 16 12 0 0 18 0.8 1.5e-06 74 0.67 0.00081
CD4 812200 238700 5848 11 10 4 1 0 20 0.93 1.8e-06 51 0.6 0.00097
MRPL10 512275 160425 4216 8 8 3 1 0 20 0.76 2.1e-06 42 0.71 0.0011
Methods & Data
Methods

In brief, we tabulate the number of mutations and the number of covered bases for each gene. The counts are broken down by mutation context category: four context categories that are discovered by MutSig, and one for indel and 'null' mutations, which include indels, nonsense mutations, splice-site mutations, and non-stop (read-through) mutations. For each gene, we calculate the probability of seeing the observed constellation of mutations, i.e. the product P1 x P2 x ... x Pm, or a more extreme one, given the background mutation rates calculated across the dataset. [1]

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References
[1] TCGA, Integrated genomic analyses of ovarian carcinoma, Nature 474:609 - 615 (2011)