Column Title Description A a binary identifier where “1” indicates that a mutation occurs in the sequence A. The mutated base is capitalized. T a binary identifier where “1” indicates that a mutation occurs in the sequence T. The mutated base is capitalized. G a binary identifier where “1” indicates that a mutation occurs in the sequence G. The mutated base is capitalized. C a binary identifier where “1” indicates that a mutation occurs in the sequence C. The mutated base is capitalized. Cg a binary identifier where “1” indicates that a mutation occurs in the sequence cG. The mutated base is capitalized. cG a binary identifier where “1” indicates that a mutation occurs in the sequence cG. The mutated base is capitalized. tCw a binary identifier where “1” indicates that a mutation occurs in the sequence tCw. The mutated base is capitalized. wGa a binary identifier where “1” indicates that a mutation occurs in the sequence wGa. The mutated base is capitalized. tCa a binary identifier where “1” indicates that a mutation occurs in the sequence tCa. The mutated base is capitalized. tGa a binary identifier where “1” indicates that a mutation occurs in the sequence tGa. The mutated base is capitalized. tCt a binary identifier where “1” indicates that a mutation occurs in the sequence tCt. The mutated base is capitalized. aGa a binary identifier where “1” indicates that a mutation occurs in the sequence aGa. The mutated base is capitalized. tC a binary identifier where “1” indicates that a mutation occurs in the sequence tC. The mutated base is capitalized. Ga a binary identifier where “1” indicates that a mutation occurs in the sequence Ga. The mutated base is capitalized. tCh a binary identifier where “1” indicates that a mutation occurs in the sequence tCh. The mutated base is capitalized. dGa a binary identifier where “1” indicates that a mutation occurs in the sequence dGa. The mutated base is capitalized. cC a binary identifier where “1” indicates that a mutation occurs in the sequence Cc. The mutated base is capitalized. Gg a binary identifier where “1” indicates that a mutation occurs in the sequence gG. The mutated base is capitalized. wrC a binary identifier where “1” indicates that a mutation occurs in the sequence wrC. The mutated base is capitalized. Gyw a binary identifier where “1” indicates that a mutation occurs in the sequence Gyw. The mutated base is capitalized. Cc a binary identifier where “1” indicates that a mutation occurs in the sequence Cc. The mutated base is capitalized. gG a binary identifier where “1” indicates that a mutation occurs in the sequence gG. The mutated base is capitalized. wA a binary identifier where “1” indicates that a mutation occurs in the sequence wA. The mutated base is capitalized. Tw a binary identifier where “1” indicates that a mutation occurs in the sequence Tw. The mutated base is capitalized. tC_mutation a binary identifier for mutations that fit the extended signature (tC to tT or tC to tG) of an APOBEC-induced mutation. “1” indicates that a mutation fits one of the following set of critieria: 1)The mutation involves the change of a “Reference_Allele” C in the sequence motif “tC” to a “Tumor_Seq_Allele2” value of “G” or “T” 2) The mutation involves the change of a “Reference_Allele” G in the sequence motif “Ga” to a “Tumor_Seq_Allele2” value of “C” or “A”. Note: This column combines information from the columns: “tC_mutation_to_G” and “tC_mutation_to_T”. tC_mutation_to_G a binary identifier for mutations that fit the the extended signature (tC to tG) of an APOBEC-induced mutation. “1” indicates that a mutation fits one of the following set of critieria: 1) The mutation involves the change of a “Reference_Allele” C in the sequence motif “tC” to a “Tumor_Seq_Allele2” value of “G” 2) The mutation involves the change of a “Reference_Allele” G in the sequence motif “Ga” to a “Tumor_Seq_Allele2” value of “C” tC_mutation_to_T a binary identifier for mutations that fit the the extended signature (tC to tT) of an APOBEC-induced mutation. “1” indicates that a mutation fits one of the following set of critieria: 1) The mutation involves the change of a “Reference_Allele” C in the sequence motif “tC” to a “Tumor_Seq_Allele2” value of “T” 2) The mutation involves the change of a “Reference_Allele” G in the sequence motif “Ga” to a “Tumor_Seq_Allele2” value of “A” APOBEC_mutation a binary identifier for mutations that fit the signature of an APOBEC-induced mutation. “1” indicates that a mutation fits one of the following set of critieria: 1)The mutation involves the change of a “Reference_Allele” C in the sequence motif “tCw” (w=A or T) to a “Tumor_Seq_Allele2” value of “G” or “T” 2) The mutation involves the change of a “Reference_Allele” G in the sequence motif “wGa” (w=A or T) to a “Tumor_Seq_Allele2” value of “C” or “A”. Note: This column combines information from the columns: “APOBEC_mutation_to_G” and “APOBEC_mutation_to_T”. APOBEC_mutation_to_G a binary identifier for mutations that fit the signature of an APOBEC-induced mutation. “1” indicates that a mutation fits one of the following set of critieria: 1) The mutation involves the change of a “Reference_Allele” C in the sequence motif “tCw” (w=A or T) to a “Tumor_Seq_Allele2” value of “G” 2) The mutation involves the change of a “Reference_Allele” G in the sequence motif “wGa” (w=A or T) to a “Tumor_Seq_Allele2” value of “C” APOBEC_mutation_to_T a binary identifier for mutations that fit the signature of an APOBEC-induced mutation. “1” indicates that a mutation fits one of the following set of critieria: 1) The mutation involves the change of a “Reference_Allele” C in the sequence motif “tCw” (w=A or T) to a “Tumor_Seq_Allele2” value of “T” 2) The mutation involves the change of a “Reference_Allele” G in the sequence motif “wGa” (w=A or T) to a “Tumor_Seq_Allele2” value of “A” [tCw_to_G+tCw_to_T]_per_mut "the fraction of total mutations in a sample that fit one of the following:- cytosine to guanine substitutions in the sequences tct or tca, -cytosine to thymine substitutions in the sequences tct or tca, -guanine to cytosine substitutions in the sequences tga or aga, - guanine to adenine substitutions in the sequences tga or aga. Values are obtained from the ""[tCw_to_G+tCw_to_T]_per_mut"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." tCw_to_G+tCw_to_T "the number of cytosine to guanine substitutions in the sequences tct or tca, cytosine to thymine substitutions in the sequences tct or tca, guanine to cytosine substitutions in the sequences tga or aga, and guanine to adenine substitutions in the sequences tga or aga in the specified sample. Values are obtained from the ""tCw_to_G+tCw_to_T"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." BH_Fisher_p-value_tCw "The resulting p-value evaluating an statistical over-representation of tCw and wGa mutations in a sample. This is generated by correcting the “Fisher_p-value_tCw” in the ""*_sum_all_fisher_Pcorr.txt"" file for multiple testing error by Benjamini-Hochberg (produced using p.adjust() in R). Values are obtained from the ""BH_Fisher_p-value_tCw"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." APOBEC_enrich "the enrichment over random of APOBEC pattern mutations in a sample. Values are obtained from the ""APOBEC_enrich"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." tCw_to_G_enrich "the enrichment over random of APOBEC signature mutations specifically involving C to G changes occuring in a sample. Values are obtained from the ""tCw_to_G_enrich"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." tCw_to_T_enrich "the enrichment over random of APOBEC signature mutations specifically involving C to T changes occuring in a sample. Values are obtained from the ""tCw_to_T_enrich"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." p-value_GvT_skew "The p-value produced by comparing for each sample the ratio of [“tCw_to_G”+“wGa_to_C”] to “[“tCw” +""wGa""-“tCw_to_G”-“wGa_to_C”]” to the ratio of [“tCw_to_T”+“wGa_to_A”] to [“tCw” +""wGa""-“tCw_to_T”-“wGa_to_A”] of the ""*_sum_all_fisher_Pcorr.txt"" file by a two sided Fisher’s exact text. Values are obtained from the ""p-value_GvT_skew"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." BH_p-value_GvT_skew "The resulting p-value for each sample generated by correcting the “p-value_GvT_skew” of the ""*_sum_all_fisher_Pcorr.txt"" file for multiple testing error by Benjamini-Hochberg (produced using p.adjust() in R). Values are obtained from the ""BH_p-value_GvT_skew"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." APOBEC_MutLoad_MinEstimate "A minimum estimate of the number of APOBEC induced mutations in a sample. This estimate is calculated using the formula: [""tCw_to_G+tCw_to_T""]*[(""APOBEC_enrich""-1)/""APOBEC_enrich""] to determine the number of APOBEC signature mutations in excess of what would be expected by random mutagenesis. Calculated values are rounded to the nearest whole number. APOBEC_MutLoad_MinEstimate is calculated only for samples with a BH_Fisher_p-value_tCw value less than or equal to 0.05, signifying a statistical over-representaion of APOBEC mutagenesis. Samples with BH_Fisher_p-value_tCw value greater than 0.05 receive a value of 0. Values are obtained from the ""APOBEC_MutLoad_MinEstimate"" column of the ""*_sum_all_fisher_Pcorr.txt"" file. The same value characterizing each sample is reported for all mutations within a sample." CONTEXT(+/-20) "41 nucleotides of reference DNA sequence, starting 20 nucleotides 5' of a mutation and ending 20 nucleotides 3' of the mutation. Flanking sequences are obtained from the corresponding human genome reference used for mapping and varient detection." a_counts "the total number of times the sequence 'a' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" t_counts "the total number of times the sequence 't' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" g_counts "the total number of times the sequence 'g' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" c_counts "the total number of times the sequence 'c' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" cg_counts "the total number of times the sequence 'cg' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" tcw_counts "the total number of times the sequence 'tcw' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" wga_counts "the total number of times the sequence 'wga' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" tca_counts "the total number of times the sequence 'tca' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" tga_counts "the total number of times the sequence 'tga' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" tct_counts "the total number of times the sequence 'tct' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" aga_counts "the total number of times the sequence 'aga' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" tc_counts "the total number of times the sequence 'tc' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" ga_counts "the total number of times the sequence 'ga' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" tch_counts "the total number of times the sequence 'tch' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" dga_counts "the total number of times the sequence 'dga' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" cc_counts "the total number of times the sequence 'cc' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" gg_counts "the total number of times the sequence 'gg' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" wrc_counts "the total number of times the sequence 'wrc' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" gyw_counts "the total number of times the sequence 'gyw' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" cc_counts "the total number of times the sequence 'cc' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" gg_counts "the total number of times the sequence 'gg' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" wa_counts "the total number of times the sequence 'wa' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" tw_counts "the total number of times the sequence 'tw' occurs in the 41 nucleotide context surrounding a mutation. The 41 nucleotide context is provided in the column ""CONTEXT(+/-20)""" RM327 “RM327” indicates that the location of a mutation is within the RM327 track (RepeatMasker v3.27 or later from the UCSC genome browser) describing the position of regional repeats. “-“ indicates that a mutation does not fall in the RM327 track simpleRepeat “simpleRepeat” indicates that a mutation occurs within the simpleRepeat track (http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=355487951&clade=mammal&org=Human&db=hg19&hgta_group=varRep&hgta_track=simpleRepeat&hgta_table=0&hgta_regionType=genome&position=chr21%3A33031597-33041570&hgta_outputType=primaryTable&hgta_outFileName=) describing the position of short sequence repeats. “-“indicates that a mutation does not fall in the simpleRepeat track. The mutation cluster analysis filters out mutations in the simpleRepeat track as this annotation indicates low sequence complexity and increases the chance or false positive mutations. All provided in the MAF should have “-“ within this field. DistBetween_Mutations The number of nucleotides between a mutation and the previous mutation on the same chromosome. The first mutation on a chromosome is left blank. Distance_to_LT_end The number of nucleotides between a mutation and the start of the chromosome (i.e. the left telomere end). Distance_to_RT_end The number of nucleotides between a mutation and the end of the chromosome (i.e. the right telomere end). Strain_Mutation_ID A numeric identifier of a mutation unique within a single Tumor_Sample_Barcode (i.e. one sample). Dataset_Mutation_ID A numeric identifier of a mutation unique among all Tumor_Sample_Barcodes in the entire dataset (i.e. all samples). Complex_ID A number that uniquely identifies all groups of mutations that exist within 10 nucleotides of each other among all Tumor_Sample_Barcodes in the entire dataset (i.e. all samples). These types of changes are likely caused by a single event error-prone DNA synthesis and are treated as a single mutation. Every row containing a nucleotide change that is a part of a complex event reports the same numeric complex ID. Rows containing mutations that are not part of complex events are left blank. Complex_Size the number of nucleotide changes with the same Complex_ID (i.e. the number of nucleotide changes that together are classified as a single complex mutation). Every row containing a nucleotide change that is a part of a cluster reports the same value. Rows containing mutations that not part of complex events are left blank. StrainCluster_ID A number that uniquely identifies a group of clustered mutations (i.e. a mutation cluster) within a single Tumor_Sample_Barcode (i.e. one sample). Every row containing a nucleotide change that is a part of a cluster reports the same strain cluster ID. Rows containing mutations that are not in clusters are left blank. Dataset_Cluster_ID "A unique number throughout all Tumor_Sample_Barcodes in the entire dataset (i.e. all samples) that identifies each cluster and is placed in every row corresponding to a mutation from a given cluster (i.e. a mutation cluster). Every row containing a mutation (base substitution, indel or complex) that is a part of a cluster reports the same dataset cluster ID. Rows containing mutations that are not in clusters are left blank. Thus every cluster is identified with a consecutive group of rows which have the same value in ""Dataset_Cluster_ID"" column." Distance_Between_Clusters The number of nucleotides between a group of clustered mutations and the previous group of clustered mutations on the same chromosome. Every row containing a nucleotide change that is a part of a cluster reports the same value. Rows containing mutations that are not in clusters are left blank. Cluster_Size_Mutations "The number of nucleotide changes in a group of clustered mutations. If the mutation cluster contains complex mutations, each nucleotide change in the complex event is counted individually. This is equivalent to the number of rows containing the same dataset cluster ID. Every row containing a nucleotide change that is a part of a cluster reports the same value. Rows containing mutations that are not in clusters are left blank." Cluster_Size_Complexes "The number of mutations in a group of clustered mutations. If the mutation cluster contains complex mutations, the entire complex event is only counted once. This is equivalent to “Cluster_Size_Mutations” if the cluster contains no complex events. If the cluster has complex events, then “Cluster_Size_Complexes” will be less than “Cluster_Size_Muations”. Every row containing a nucleotide change that is a part of a cluster reports the same value. Rows containing mutations that are not in clusters are left blank." Cluster_Length The number of nucleotides from the first nucleotide change on a mutation cluster to the last nucleotide change. Every row containing a nucleotide change that is a part of a cluster reports the same value. Rows containing mutations that are not in clusters are left blank. Cluster_Coordination "Identifies a cluster’s the strand-coordination class. Values of A, T, G, and C are reported respectively if all mutations in a cluster occur with a reference_allelle of A, T, G, or C and the cluster contains no complex events or frameshifts. A value of N is reported for a cluster if there is any variation in the reference_allelle among mutations or if the cluster contains a complex event or a frameshift. Every row containing a nucleotide change that is a part of a cluster reports the same value. Rows containing mutations that are not in clusters are left blank." Content_of_non_coordinated_cluster "the types of bases mutated in a non-coordinated cluster. This value only describes the type of bases mutated. Clusters containing multiple mutations in the same base will only have that bases indicated once in this field (ex: a non-coordinated cluster containing 4 mutations, 2 in A, 1 in C and 1 in G will have the value A_C_G). Insertions are indicated with “-“. “NA” indicates the cluster contains at least 1 complex mutation. The same value is provided for all mutations occurring in the same non-coordinated cluster as can be identified by the same Dataset_Cluster_ID value and the value “N” reported in the Cluster_Coordination field. Only rows containing “N” in the Cluster_Coordination field contain a value. All other rows are blank." Cluster_Pvalue A calculated value for the likelihood to producing a given distribution of mutations in a cluster by random distribution of all the mutations in a single Tumor_Sample_Barcode (i.e. one sample). Every row containing a nucleotide change that is a part of a cluster reports the same value. Rows containing mutations that are not in clusters are left blank.