SNP6 Copy number analysis (GISTIC2)
Overview
Introduction

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. The pipeline first filters out normal samples from the segmented copy-number data by inspecting the TCGA barcodes and then executes GISTIC version 2.0.23.

Summary

There were 10 tumor samples used in this analysis: 14 significant arm-level results, 2 significant focal amplifications, and 6 significant focal deletions were found.

Results
Focal results

Figure 1.  Genomic positions of amplified regions: the X-axis represents the normalized amplification signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 1.  Get Full Table Amplifications Table - 2 significant amplifications found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband Q value Residual Q value Wide Peak Boundaries # Genes in Wide Peak
14q13.3 0.00072526 0.00072526 chr14:35850001-37820000 15
11q13.3 0.035381 0.035381 chr11:68840001-70000000 17
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 14q13.3.

Table S1.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
NKX2-1
FOXA1
PAX9
NKX2-8
MBIP
BRMS1L
SLC25A21
MIPOL1
SFTA3
TTC6
SLC25A21-AS1
NKX2-1-AS1
MIR4503
PTCSC3
LINC00609
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 11q13.3.

Table S2.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CCND1
CPT1A
FGF3
FGF4
IGHMBP2
FGF19
MYEOV
MRGPRD
MRGPRF
MRPL21
TPCN2
ORAOV1
LOC338694
MIR3164
MRGPRF-AS1
LINC01488
LOC102724265

Figure 2.  Genomic positions of deleted regions: the X-axis represents the normalized deletion signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 2.  Get Full Table Deletions Table - 6 significant deletions found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband Q value Residual Q value Wide Peak Boundaries # Genes in Wide Peak
13q11 0.018693 0.018693 chr13:1-20140000 16
17p11.2 0.018693 0.018693 chr17:13010001-27130000 136
21p11.2 0.018693 0.018693 chr21:7820001-8989999 11
22p11.2 0.021148 0.020717 chr22:10960001-16940000 14
9p23 0.055391 0.055391 chr9:10620001-14620000 8
13q31.1 0.06422 0.063154 chr13:71860002-91399999 80
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 13q11.

Table S3.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
TUBA3C
ZMYM2
ZMYM5
FAM230C
MPHOSPH8
PSPC1
TPTE2
ANKRD20A9P
LINC00442
ANKRD26P3
LINC00421
LINC00408
LINC00350
LINC00417
LOC101928697
LINC01072
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 17p11.2.

Table S4.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
ADORA2B
ALDH3A1
ALDH3A2
COX10
DRG2
FOXO3B
FLII
KCNJ12
LLGL1
MEIS3P1
MFAP4
PMP22
MAPK7
MAP2K3
PRPSAP2
SHMT1
SREBF1
TOP3A
UBB
RNF112
COPS3
TMEM11
PIGL
NCOR1
ULK2
CCDC144A
HS3ST3B1
HS3ST3A1
PEMT
FBXW10
TRIM16
RAI1
GRAP
AKAP10
EPN2
MPRIP
USP22
TNFRSF13B
DHRS7B
SNORD49A
SNORD3B-1
B9D1
TVP23B
MYO15A
TRPV2
RASD1
ALKBH5
TTC19
MED9
SLC47A1
NT5M
ZNF286A
ZNF287
ZNF624
TEKT3
GID4
FAM106A
DRC3
MGC12916
ATPAF2
SPECC1
CDRT7
CDRT8
CDRT15P1
LRRC75A-AS1
ZSWIM7
MIEF2
SLC5A10
SMCR5
SMCR8
TOM1L2
SLC47A2
CDRT15
TRIM16L
USP32P1
TVP23C
CENPV
FLCN
PLD6
USP32P2
TBC1D28
CDRT15L2
NATD1
CDRT4
CCDC144B
FAM27E5
FLJ36000
LGALS9B
CCDC144NL
KRT17P5
LOC339260
C17orf51
CCDC144CP
TBC1D26
CDRT1
LRRC75A
LOC388436
KRT16P2
FLJ35934
GRAPL
CCDC144NL-AS1
CDRT15P2
FAM83G
KRT16P3
EVPLL
LGALS9C
SNORA59A
SNORD49B
SNORD65
MIR33B
KRT16P1
ZNF286B
SNORD3A
SNORD3C
SNORD3D
FAM106CP
KCNJ18
LOC100287072
MIR1288
MIR1180
MIR548H3
MTRNR2L1
TVP23C-CDRT4
MIR4731
RAI1-AS1
EPN2-AS1
COX10-AS1
EPN2-IT1
FAM106B
LINC01563
LOC101928475
LOC101928567
MIR6777
MIR6778
SMCR2
LOC105371703
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 21p11.2.

Table S5.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
MIR663A
RNA5-8S5
RNA18S5
RNA28S5
MIR3687-1
MIR3648-1
LOC100507412
RNA45S5
MIR6724-1
MIR6724-2
MIR6724-3
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 22p11.2.

Table S6.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
ANKRD62P1-PARP4P3
POTEH
HSFY1P1
OR11H1
CCT8L2
XKR3
TPTEP1
DUXAP8
POTEH-AS1
BMS1P17
LOC101929350
LOC102723769
LINC01297
BMS1P22
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 9p23.

Table S7.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
NFIB
TYRP1
MPDZ
LURAP1L
FLJ41200
LINC00583
LURAP1L-AS1
SNORD137
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 13q31.1.

Table S8.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
KLF5
BTF3P11
CLN5
EDNRB
LMO7
POU4F1
UCHL3
SCEL
TBC1D4
SPRY2
PIBF1
KLF12
DIS3
MYCBP2
SLITRK5
FBXL3
NDFIP2
RBM26
RNF219
BORA
SLITRK6
SLITRK1
KCTD12
SLAIN1
LINC00410
COMMD6
LINC00347
MIR17
MIR18A
MIR19A
MIR19B1
MIR20A
MIR92A1
MIR17HG
MZT1
MIR4500HG
CTAGE11P
MIR622
LMO7DN
ACOD1
LOC100129307
MIR3665
EDNRB-AS1
RBM26-AS1
MIR4500
LINC00564
LINC00560
LINC00331
LINC00333
LINC00351
LINC00353
LINC00379
LINC00381
LINC00392
LINC00433
LINC00440
LINC00446
LINC00559
NDFIP2-AS1
MYCBP2-AS1
RNF219-AS1
LINC01080
LMO7-AS1
LINC01069
LINC00382
LINC01049
LINC00380
LINC00397
LINC01038
LINC00377
LINC01068
LMO7DN-IT1
LINC00375
SCEL-AS1
LOC105370306
LINC01040
LINC01047
LINC00430
LINC01078
SNORA107
Arm-level results

Table 3.  Get Full Table Arm-level significance table - 14 significant results found. The significance cutoff is at Q value=0.25.

Arm # Genes Amp Frequency Amp Frequency Score Amp Z score Amp Q value Del Frequency Del Frequency Score Del Z score Del Q value
1p 3092 0.30 0.75 1.72 0.51 0.60 0.86 2.88 0.0242
1q 2863 0.60 0.67 1.99 0.51 0.10 0.25 -0.418 0.836
2p 1500 0.40 0.40 0.017 0.983 0.00 0.00 -1.99 0.99
2q 2495 0.20 0.20 -1.07 0.983 0.00 0.00 -2.13 0.99
3p 1531 0.00 0.00 -0.81 0.983 0.90 0.90 3.26 0.0164
3q 1708 0.10 0.25 -0.574 0.983 0.60 0.67 1.7 0.164
4p 693 0.00 0.00 -2.11 0.983 0.40 0.40 -0.164 0.798
4q 1467 0.00 0.00 -1.82 0.983 0.50 0.50 0.655 0.512
5p 456 0.30 0.30 -0.855 0.983 0.00 0.00 -2.32 0.99
5q 2185 0.10 0.14 -1.26 0.983 0.30 0.33 -0.248 0.82
6p 1664 0.40 0.44 0.325 0.983 0.10 0.17 -1.13 0.99
6q 1304 0.20 0.33 -0.354 0.983 0.40 0.50 0.552 0.536
7p 926 0.50 0.62 1.19 0.802 0.20 0.40 -0.0792 0.773
7q 1717 0.30 0.50 0.554 0.983 0.40 0.57 0.986 0.372
8p 795 0.20 0.40 -0.0999 0.983 0.50 0.62 1.16 0.326
8q 1249 0.70 0.70 1.89 0.51 0.00 0.00 -1.43 0.99
9p 621 0.00 0.00 -0.865 0.983 0.90 0.90 3.02 0.0205
9q 1610 0.10 0.33 -0.213 0.983 0.70 0.78 2.36 0.0627
10p 767 0.10 0.11 -1.89 0.983 0.10 0.11 -1.89 0.99
10q 1968 0.20 0.25 -0.763 0.983 0.20 0.25 -0.763 0.957
11p 1162 0.20 0.40 -0.0418 0.983 0.50 0.62 1.24 0.303
11q 2133 0.10 0.17 -1.05 0.983 0.40 0.44 0.429 0.594
12p 804 0.50 0.50 0.501 0.983 0.00 0.00 -1.91 0.99
12q 2055 0.40 0.40 0.144 0.983 0.00 0.00 -1.91 0.99
13p 0 0.00 0.00 -1.28 0.983 0.80 0.80 2.23 0.0695
13q 1092 0.00 0.00 -1.87 0.983 0.50 0.50 0.568 0.536
14p 0 0.30 0.60 0.675 0.983 0.50 0.71 1.41 0.256
14q 1829 0.50 0.71 1.79 0.51 0.30 0.60 0.984 0.372
15p 0 0.00 0.00 -2.02 0.983 0.50 0.50 0.318 0.621
15q 2082 0.00 0.00 -1.91 0.983 0.40 0.40 0.15 0.704
16p 1378 0.20 0.40 -0.00746 0.983 0.50 0.62 1.29 0.296
16q 1070 0.30 0.43 0.0868 0.983 0.30 0.43 0.0868 0.721
17p 937 0.20 0.50 0.336 0.983 0.60 0.75 1.91 0.123
17q 2298 0.30 0.43 0.324 0.983 0.30 0.43 0.324 0.621
18p 212 0.20 0.29 -0.835 0.983 0.30 0.38 -0.384 0.836
18q 644 0.20 0.40 -0.124 0.983 0.50 0.62 1.13 0.326
19p 1331 0.00 0.00 -0.822 0.983 0.90 0.90 3.2 0.0164
19q 2402 0.20 0.33 -0.165 0.983 0.40 0.50 0.788 0.46
20p 583 0.50 0.71 1.52 0.614 0.30 0.60 0.77 0.46
20q 1092 0.40 0.50 0.508 0.983 0.20 0.33 -0.39 0.836
21p 92 0.00 0.00 -1.56 0.983 0.70 0.70 1.61 0.184
21q 750 0.10 0.33 -0.317 0.983 0.70 0.78 2.15 0.0757
22p 2 0.00 0.00 -1.28 0.983 0.80 0.80 2.23 0.0695
22q 1258 0.00 0.00 -1.17 0.983 0.80 0.80 2.54 0.0535
Xp 945 0.50 0.62 1.19 0.802 0.20 0.40 -0.0762 0.773
Xq 1533 0.50 0.56 0.977 0.983 0.10 0.20 -0.897 0.978
Yp 113 0.20 0.50 0.218 0.983 0.60 0.75 1.73 0.164
Yq 245 0.20 0.67 0.786 0.983 0.70 0.88 2.47 0.054
Methods & Data
Input
Description
  • Segmentation File: The segmentation file contains the segmented data for all the samples identified by GLAD, CBS, or some other segmentation algorithm. (See GLAD file format in the Genepattern file formats documentation.) It is a six column, tab-delimited file with an optional first line identifying the columns. Positions are in base pair units.The column headers are: (1) Sample (sample name), (2) Chromosome (chromosome number), (3) Start Position (segment start position, in bases), (4) End Position (segment end position, in bases), (5) Num markers (number of markers in segment), (6) Seg.CN (log2() -1 of copy number).

  • Markers File: The markers file identifies the marker names and positions of the markers in the original dataset (before segmentation). It is a three column, tab-delimited file with an optional header. The column headers are: (1) Marker Name, (2) Chromosome, (3) Marker Position (in bases).

  • Reference Genome: The reference genome file contains information about the location of genes and cytobands on a given build of the genome. Reference genome files are created in Matlab and are not viewable with a text editor.

  • CNV Files: There are two options for the cnv file. The first option allows CNVs to be identified by marker name. The second option allows the CNVs to be identified by genomic location. Option #1: A two column, tab-delimited file with an optional header row. The marker names given in this file must match the marker names given in the markers file. The CNV identifiers are for user use and can be arbitrary. The column headers are: (1) Marker Name, (2) CNV Identifier. Option #2: A 6 column, tab-delimited file with an optional header row. The 'CNV Identifier' is for user use and can be arbitrary. 'Narrow Region Start' and 'Narrow Region End' are also not used. The column headers are: (1) CNV Identifier, (2) Chromosome, (3) Narrow Region Start, (4) Narrow Region End, (5) Wide Region Start, (6) Wide Region End

  • Amplification Threshold: Threshold for copy number amplifications. Regions with a log2 ratio above this value are considered amplified.

  • Deletion Threshold: Threshold for copy number deletions. Regions with a log2 ratio below the negative of this value are considered deletions.

  • Cap Values: Minimum and maximum cap values on analyzed data. Regions with a log2 ratio greater than the cap are set to the cap value; regions with a log2 ratio less than -cap value are set to -cap. Values must be positive.

  • Broad Length Cutoff: Threshold used to distinguish broad from focal events, given in units of fraction of chromosome arm.

  • Remove X-Chromosome: Flag indicating whether to remove data from the X-chromosome before analysis. Allowed values= {1,0} (1: Remove X-Chromosome, 0: Do not remove X-Chromosome.

  • Confidence Level: Confidence level used to calculate the region containing a driver.

  • Join Segment Size: Smallest number of markers to allow in segments from the segmented data. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number.

  • Arm Level Peel Off: Flag set to enable arm-level peel-off of events during peak definition. The arm-level peel-off enhancement to the arbitrated peel-off method assigns all events in the same chromosome arm of the same sample to a single peak. It is useful when peaks are split by noise or chromothripsis. Allowed values= {1,0} (1: Use arm level peel off, 0: Use normal arbitrated peel-off).

  • Maximum Sample Segments: Maximum number of segments allowed for a sample in the input data. Samples with more segments than this threshold are excluded from the analysis.

  • Gene GISTIC: When enabled (value = 1), this option causes GISTIC to analyze deletions using genes instead of array markers to locate the lesion. In this mode, the copy number assigned to a gene is the lowest copy number among the markers that represent the gene.

Values

List of inputs used for this run of GISTIC2. All files listed should be included in the archived results.

  • Segmentation File = /cromwell_root/fc-e7058367-eaa6-44b5-aab5-1ec08acf146a/cptac3_luad_cnv.G4.seg.txt

  • Markers File = ./this_file_does_not_exist.txt

  • Reference Genome = /cromwell_root/broad-institute-gdac/gdc/reference/gistic/gistic2.refgene.hg38.UCSC.add_miR.160920.mat

  • CNV Files = /cromwell_root/broad-institute-gdac/gdc/reference/gistic/hg38_GDC_SNP6_CNV_list.161107.txt

  • Amplification Threshold = 0.1

  • Deletion Threshold = 0.1

  • Cap Values = 1.5

  • Broad Length Cutoff = 0.5

  • Remove X-Chromosome = 0

  • Confidence Level = 0.99

  • Join Segment Size = 4

  • Arm Level Peel Off = 1

  • Maximum Sample Segments = 2000

  • Gene GISTIC = 1

Table 4.  Get Full Table First 10 out of 10 Input Tumor Samples.

Tumor Sample Names
C3L-00094_TP
C3L-00913_TP
C3L-01632_TP
C3L-02345_TP
C3N-00180_TP
C3N-00217_TP
C3N-00547_TP
C3N-01016_TP
C3N-02089_TP
C3N-02155_TP

Figure 3.  Segmented copy number profiles in the input data

Output
All Lesions File (all_lesions.conf_##.txt, where ## is the confidence level)

The all lesions file summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row, starting in column 10.

Region Data

Columns 1-9 present the data about the significant regions as follows:

  1. Unique Name: A name assigned to identify the region.

  2. Descriptor: The genomic descriptor of that region.

  3. Wide Peak Limits: The 'wide peak' boundaries most likely to contain the targeted genes. These are listed in genomic coordinates and marker (or probe) indices.

  4. Peak Limits: The boundaries of the region of maximal amplification or deletion.

  5. Region Limits: The boundaries of the entire significant region of amplification or deletion.

  6. Q values: The Q value of the peak region.

  7. Residual Q values: The Q value of the peak region after removing ('peeling off') amplifications or deletions that overlap other, more significant peak regions in the same chromosome.

  8. Broad or Focal: Identifies whether the region reaches significance due primarily to broad events (called 'broad'), focal events (called 'focal'), or independently significant broad and focal events (called 'both').

  9. Amplitude Threshold: Key giving the meaning of values in the subsequent columns associated with each sample.

Sample Data

Each of the analyzed samples is represented in one of the columns following the lesion data (columns 10 through end). The data contained in these columns varies slightly by section of the file. The first section can be identified by the key given in column 9 - it starts in row 2 and continues until the row that reads 'Actual Copy Change Given.' This section contains summarized data for each sample. A '0' indicates that the copy number of the sample was not amplified or deleted beyond the threshold amount in that peak region. A '1' indicates that the sample had low-level copy number aberrations (exceeding the low threshold indicated in column 9), and a '2' indicates that the sample had high-level copy number aberrations (exceeding the high threshold indicated in column 9).The second section can be identified the rows in which column 9 reads 'Actual Copy Change Given.' The second section exactly reproduces the first section, except that here the actual changes in copy number are provided rather than zeroes, ones, and twos.The final section is similar to the first section, except that here only broad events are included. A 1 in the samples columns (columns 10+) indicates that the median copy number of the sample across the entire significant region exceeded the threshold given in column 9. That is, it indicates whether the sample had a geographically extended event, rather than a focal amplification or deletion covering little more than the peak region.

Amplification Genes File (amp_genes.conf_##.txt, where ## is the confidence level)

The amp genes file contains one column for each amplification peak identified in the GISTIC analysis. The first four rows are:

  1. Cytoband

  2. Q value

  3. Residual Q value

  4. Wide Peak Boundaries

These rows identify the lesion in the same way as the all lesions file.The remaining rows list the genes contained in each wide peak. For peaks that contain no genes, the nearest gene is listed in brackets.

Deletion Genes File (del_genes.conf_##.txt, where ## is the confidence level)

The del genes file contains one column for each deletion peak identified in the GISTIC analysis. The file format for the del genes file is identical to the format for the amp genes file.

Gistic Scores File (scores.gistic)

The scores file lists the Q values [presented as -log10(q)], G scores, average amplitudes among aberrant samples, and frequency of aberration, across the genome for both amplifications and deletions. The scores file is viewable with the Genepattern SNPViewer module and may be imported into the Integrated Genomics Viewer (IGV).

Segmented Copy Number (raw_copy_number.{fig|pdf|png} )

The segmented copy number is a pdf file containing a colormap image of the segmented copy number profiles in the input data.

Amplification Score GISTIC plot (amp_qplot.{fig|pdf|png|v2.pdf})

The amplification pdf is a plot of the G scores (top) and Q values (bottom) with respect to amplifications for all markers over the entire region analyzed.

Deletion Score GISTIC plot (del_qplot.{fig|pdf|png|v2.pdf})

The deletion pdf is a plot of the G scores (top) and Q values (bottom) with respect to deletions for all markers over the entire region analyzed.

Tables (table_{amp|del}.conf_##.txt, where ## is the confidence level)

Tables of basic information about the genomic regions (peaks) that GISTIC determined to be significantly amplified or deleted. These describe three kinds of peak boundaries, and list the genes contained in two of them. The region start and region end columns (along with the chromosome column) delimit the entire area containing the peak that is above the significance level. The region may be the same for multiple peaks. The peak start and end delimit the maximum value of the peak. The extended peak is the peak determined by robust, and is contained within the wide peak reported in {amp|del}_genes.txt by one marker.

Broad Significance Results (broad_significance_results.txt)

A table of per-arm statistical results for the data set. Each arm is a row in the table. The first column specifies the arm and the second column counts the number of genes known to be on the arm. For both amplification and deletion, the table has columns for the frequency of amplification or deletion of the arm, and a Z score and Q value.

Broad Values By Arm (broad_values_by_arm.txt)

A table of chromosome arm amplification levels for each sample. Each row is a chromosome arm, and each column a sample. The data are in units of absolute copy number -2.

All Data By Genes (all_data_by_genes.txt)

A gene-level table of copy number values for all samples. Each row is the data for a gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. The copy number values in the table are in units of (copy number -2), so that no amplification or deletion is 0, genes with amplifications have positive values, and genes with deletions are negative values. The data are converted from marker level to gene level using the extreme method: a gene is assigned the greatest amplification or the least deletion value among the markers it covers.

Broad Data By Genes (broad_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only broad events with lengths greater than the broad length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

Focal Data By Genes (focal_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only focal events with lengths greater than the focal length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

All Thresholded By Genes (all_thresholded.by_genes.txt)

A gene-level table of discrete amplification and deletion indicators at for all samples. There is a row for each gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. A table value of 0 means no amplification or deletion above the threshold. Amplifications are positive numbers: 1 means amplification above the amplification threshold; 2 means amplifications larger to the arm level amplifications observed for the sample. Deletions are represented by negative table values: -1 represents deletion beyond the threshold; -2 means deletions greater than the minimum arm-level deletion observed for the sample.

Sample Cutoffs (sample_cutoffs.txt)

A table of the per-sample threshold cutoffs (in units of absolute copy number -2) used to distinguish the high level amplifications (+/-2) from ordinary amplifications (+/-1) in the all_thresholded.by_genes.txt output file. The table contains three columns: the sample identifier followed by the low (deletion) and high (amplification) cutoff values. The cutoffs are calculated as the minimum arm-level amplification level less the deletion threshold for deletions and the maximum arm-level amplification plus the amplification threshold for amplifications.

Focal Input To Gistic (focal_input.seg.txt)

A list of copy number segments describing just the focal events present in the data. The segment amplification/deletion levels are in units of (copy number -2), with amplifications positive and deletions negative numbers. This file may be viewed with IGV.

Gene Counts vs. Copy Number Alteration Frequency (freqarms_vs_ngenes.{fig|pdf})

An image showing the correlation between gene counts and frequency of copy number alterations.

Confidence Intervals (regions_track.conf_##.bed, where ## is the confidence level)

A file indicating the position of the confidence intervals around GISTIC peaks that can be loaded as a track in a compatible viewer browser such as IGV or the UCSC genome browser.

GISTIC

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. It takes segmented copy number ratios as input, separates arm-level events from focal events, and then performs two tests: (i) identifies significantly amplified/deleted chromosome arms; and (ii) identifies regions that are significantly focally amplified or deleted. For the focal analysis, the significance levels (Q values) are calculated by comparing the observed gains/losses at each locus to those obtained by randomly permuting the events along the genome to reflect the null hypothesis that they are all 'passengers' and could have occurred anywhere. The locus-specific significance levels are then corrected for multiple hypothesis testing. The arm-level significance is calculated by comparing the frequency of gains/losses of each arm to the expected rate given its size. The method outputs genomic views of significantly amplified and deleted regions, as well as a table of genes with gain or loss scores. A more in depth discussion of the GISTIC algorithm and its utility is given in [1], [3], and [5].

CNV Description

Regions of the genome that are prone to germ line variations in copy number are excluded from the GISTIC analysis using a list of germ line copy number variations (CNVs). A CNV is a DNA sequence that may be found at different copy numbers in the germ line of two different individuals. Such germ line variations can confound a GISTIC analysis, which finds significant somatic copy number variations in cancer. A more in depth discussion is provided in [6]. GISTIC currently uses two CNV exclusion lists. One is based on the literature describing copy number variation, and a second one comes from an analysis of significant variations among the blood normals in the TCGA data set.

References
[1] Beroukhim et al, Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma, Proc Natl Acad Sci U S A. Vol. 104:50 (2007)
[3] Mermel et al, GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers, Genome Biology Vol. 12:4 (2011)
[5] Beroukhim et al., The landscape of somatic copy-number alteration across human cancers, Nature Vol. 463:7283 (2010)
[6] McCarroll, S. A. et al., Integrated detection and population-genetic analysis of SNPs and copy number variation, Nat Genet Vol. 40(10):1166-1174 (2008)