There are 108 tumor samples in this analysis. The Benjamini-Hochberg-corrected p-value for enrichment of the APOBEC mutation signature in 37 samples is <=0.05. Out of these, 25 have enrichment values >2, which implies that in such samples at least 50% of APOBEC signature mutations have been in fact made by APOBEC enzyme(s).
Column content and calculation of values referred in the legends to all figures below are described in “Readme_columns_in_sum_files.txt” file (see hyperlink in the Output section.)
Figure 1. Get High-res Image Fold-enrichment of APOBEC mutagenesis signature over the expected occurrence for random mutagenesis. Fold enrichment values are taken from the “APOBEC_enrich” column and Benjamini-Hochberg-corrected p-values are taken from the “BH_Fisher_p-value_tCw” column in the “*_sorted_sum_all_fisher_Pcorr.txt” file. Samples are categorized into color coded bins based on their q-values and fold enrichment values. Fold enrichment bin sizes are increments of 1 unit. All samples displaying a q-value >0.05 are placed in one bin (black) regardless of the fold APOBEC enrichment in the sample. The maximum fold enrichment for each bin is indicated in the figure legend, with the number of samples in its category shown in parentheses.

Figure 2. Get High-res Image Fold enrichment of the APOBEC mutagenesis signature over the expected occurrence for random mutagenesis in individual samples. Fold enrichment values are taken from the “APOBEC_enrich” column and Benjamini-Hochberg-corrected p-values are taken from the “BH_Fisher_p-value_tCw” column in “*_sorted_sum_all_fisher_Pcorr.txt” file. See also legend to Figure 1.

Figure 3. Get High-res Image Relative load of APOBEC signature mutations. The Fraction of Total Mutations values are taken from the “[tCw_to_G+tCw_to_T]_per_mut” column in the “*_sorted_sum_all_fisher_Pcorr.txt” file.

Figure 4. Get High-res Image Numbers of APOBEC and non-APOBEC mutations (X-axis) in each sample (Y-axis). All values are calculated based on the values in the columns “indels”, “substitutions”, and columns containing individual substitution types in the “*_sorted_sum_all_fisher_Pcorr.txt” file. “APOBEC to G” – number of tCw→tGw mutations; “APOBEC to T” – number of tCw→tTw mutations; “non-APOBEC C:G” – number of mutations in C:G base pairs not conforming to the APOBEC signature (= total of tCw→tAw plus all mutations in C not occurring in the tCw motif); “A:T” –- all mutations in A:T base pairs; “indels” – all indels. All counts include complementary mutations. Samples are ordered by total mutation counts in the descending order. Since only base substitutions are shown, bars with excessive numbers of non-substitution mutations may be shorter than the two flanking bars.

Figure 5. Get High-res Image Minimum estimate of the number of APOBEC induced mutations in a sample. “APOBEC_MutLoad_MinEstimate” is calculated using the formula: [“tCw_to_G+tCw_to_T”]*[(“APOBEC_enrich”-1)/“APOBEC_enrich”] to determine the number of APOBEC signature mutations in excess of what would be expected by random mutagenesis. Calculated values are rounded to the nearest whole number. “APOBEC_MutLoad_MinEstimate” is calculated only for samples with a “BH_Fisher_p-value_tCw” value less than or equal to 0.05, signifying a statistical over-representation of APOBEC mutagenesis. Samples with “BH_Fisher_p-value_tCw” value greater than 0.05 receive a value of 0. “APOBEC_MutLoad_MinEstimate” is plotted on a logarithmic scale (with a pseudocount of 1) for better visualization of values in all sections of its range.

Figure 6. Get High-res Image Numbers of different types of mutation clusters. Values are taken from columns with corresponding names in the file “*_sorted_sum_clusters.txt”.

Figure 7. Get High-res Image The number of mutations within C- or G-coordinated clusters occurring in different known C- or G-specific mutation motifs. Counts are totaled among all samples. “C/G” – any mutation in C:G base pairs; “TC/GA” – any mutation in a less stringent APOBEC motif: tC (mutated nucleotide capitalized); “TCW/WGA” – any mutation in the stringent APOBEC motif: tCw; “WRC/GYW” – any mutation in the wrC motif for AID cytidine deaminase (r=A or G; y=C or T); “CC/GG” – any mutation in the cC motif for APOBEC3G; “CG/CG” – any mutation in the Cg (or CpG) motif frequently methylated (5me-C) and prone to deamination. Fold enrichments are calculated using values from the corresponding columns (including complementary mutations) in the “Totals” row of the “*_sorted_sum_G_C_clusters.txt” file. Fold enrichments for each signature (shown as numbers above the bars) are calculated the same as the fold enrichment for the APOBEC mutation motif. All three possible substitutions of C are included (i.e. to A, T and G). When tCw-specific APOBECs are the source of clustered mutations, both tC and tCw should be enriched with the latter being greater. Moreover, other C-specific mutation signatures should be depleted.

Figure 8. Get High-res Image The number of mutations categorized into three possible types of substitutions at the tCw motif (complementary mutations included) within C- or G-coordinated clusters. Numbers are calculated using the “Totals” row in the corresponding columns of the “*_sorted_sum_G_C_clusters.txt” file. Mutations caused by cytidine deamination in stretches of ssDNA caused by APOBEC mutagenesis are expected to be predominantly C-->T or C-->G with very few C-->A. Base substitution categories are shown within TCW context.

Figure 9.
Get High-res Image
Fold enrichment of the APOBEC mutation signature in clusters of different sizes of two categories – C- or G-coordinated or non-coordinated as well as in non-clustered (scattered) mutations. Smaller clusters usually show less enrichment because they have a higher chance to be formed by random mutations that occurred in close vicinity to each other. Values above the bars display the total number of mutation clusters in each class and in parentheses the total numbers of APOBEC signature mutations in a category. Values for fold enrichment as well as mutation and mutation cluster counts are taken from the following output files which can be downloaded from firehose_get (https://confluence.broadinstitute.org/display/GDAC/Download):
“*_sorted_sum10c.txt” for scattered mutations;
“*_sorted_sum04a.txt” for C- or G-coordinated clusters with 2 mutations;
“*_sorted_sum04b.txt” for C- or G-coordinated clusters with 3 mutations;
“*_sorted_sum04f.txt” for C- or G-coordinated clusters with 4 mutations;
“*_sorted_sum04h.txt” for C- or G-coordinated clusters with 5 mutations;
“*_sorted_sum04i.txt” for C- or G-coordinated clusters with >5 mutations;
“*_sorted_sum01a.txt” for non-coordinated clusters with 2 mutations;
“*_sorted_sum01b.txt” for non-coordinated clusters with 3 mutations;
“*_sorted_sum01f.txt” for non-coordinated clusters with 4 mutations;
“*_sorted_sum01h.txt” for non-coordinated clusters with 5 mutations;
“*_sorted_sum01i.txt” for non-coordinated clusters with >5 mutations;
and are compiled in the file with the summary of calculation results “*_sorted_sum_APOBECenrich_ClustSize.txt”.
