SNP6 Copy number analysis (GISTIC2)
Kidney Chromophobe (Primary solid tumor)
02 April 2015  |  analyses__2015_04_02
Maintainer Information
Citation Information
Maintained by TCGA GDAC Team (Broad Institute/MD Anderson Cancer Center/Harvard Medical School)
Cite as Broad Institute TCGA Genome Data Analysis Center (2015): SNP6 Copy number analysis (GISTIC2). Broad Institute of MIT and Harvard. doi:10.7908/C13J3C06
Overview
Introduction

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. The pipeline first filters out normal samples from the segmented copy-number data by inspecting the TCGA barcodes and then executes GISTIC version 2.0.22 (Firehose task version: 140).

Summary

There were 66 tumor samples used in this analysis: 14 significant arm-level results, 2 significant focal amplifications, and 0 significant focal deletions were found.

Results
Focal results

Figure 1.  Genomic positions of amplified regions: the X-axis represents the normalized amplification signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 1.  Get Full Table Amplifications Table - 2 significant amplifications found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband Q value Residual Q value Wide Peak Boundaries # Genes in Wide Peak
8q11.23 0.2211 0.2211 chr8:51697150-53706540 7
15q22.31 0.2211 0.2211 chr15:1-66428823 449
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 8q11.23.

Table S1.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
SNORA7|ENSG00000201316.1
ST18
RB1CC1
SNTG1
PCMTD1
PXDNL
FAM150A
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 15q22.31.

Table S2.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
BUB1B
TCF12
MIR4511
snoU13|ENSG00000238311.1
snoU13|ENSG00000238715.1
SNORA24|ENSG00000206903.1
CLPX
MTFMT
RN7SL348P
MIR1272
RN7SL707P
RN7SL595P
SNORA48|ENSG00000252774.1
USP3
CA12
MIR190A
RN7SL613P
RNA5SP397
RNA5SP396
snoU13|ENSG00000238767.1
MIR2116
C15ORF31
U3|ENSG00000200318.1
RN7SKP95
snoU13|ENSG00000239100.1
MYZAP
LINC00926
snoU13|ENSG00000239035.1
RN7SL568P
snoU13|ENSG00000238513.1
PYGO1
MIR628
RSL24D1
ONECUT1
MIR1266
U6|ENSG00000272337.1
RN7SL354P
RNA5SP395
U6|ENSG00000271819.1
RN7SL494P
RNA5SP394
FGF7
RN7SL307P
RN7SL577P
RN7SKP139
FKSG62
RN7SKP101
SNORD11|ENSG00000238819.1
SNORA41|ENSG00000207516.1
HMGN2P46
snoU13|ENSG00000238583.1
SLC30A4
SNORA11|ENSG00000261709.2
SORD
B2M
RN7SL347P
HYPK
CATSPER2P1
snoU13|ENSG00000238494.1
snoU13|ENSG00000238535.1
RN7SL487P
CCNDBP1
snoU13|ENSG00000239025.1
MIR627
MIR4310
RNA5SP393
MIR626
RN7SL497P
RN7SL376P
snoU13|ENSG00000238559.1
LINC00594
RNA5SP392
LINC00984
snoU13|ENSG00000238564.1
THBS1
FAM98B
U3|ENSG00000212511.1
CSNK1A1P1
MIR3942
ANP32AP1
GJD2
SNORA18|ENSG00000252425.1
SLC12A6
TMCO5B
SNORD77|ENSG00000212415.1
snoU13|ENSG00000238342.1
RN7SL286P
RN7SL539P
GOLGA8O
U8|ENSG00000206987.1
ULK4P1
RN7SL185P
GOLGA8K
CHRNA7
SNORA18|ENSG00000206849.1
MIR211
RN7SL82P
U8|ENSG00000252602.1
RN7SL628P
GOLGA8H
U8|ENSG00000207430.1
ULK4P2
RN7SL796P
GOLGA8Q
RN7SL196P
GOLGA8R
U8|ENSG00000238519.1
RN7SL469P
GOLGA8T
U8|ENSG00000207432.1
ULK4P3
RN7SL673P
GOLGA8J
snoZ278
GOLGA6L7P
WHAMMP2
RN7SL719P
GOLGA8M
RN7SL829P
RN7SL238P
GOLGA8F
RNA5SP391
LINC00929
SNORA48|ENSG00000212604.1
MIR4715
RNA5SP390
ATP10A
SNORD109B
SNORD115|ENSG00000212428.1
SNORD109A
SNORD108
SNORD64|ENSG00000270704.2
SNHG14
SNURF
snoU13|ENSG00000238615.1
PWRN1
PWRN2
MAGEL2
RN7SL536P
GOLGA8S
RN7SL106P
HERC2P2
RN7SL495P
WHAMMP3
RN7SL545P
GOLGA8DP
MIR1268A
OR4N4
snoU13|ENSG00000238960.1
DKFZP547L112
RN7SL400P
CT60
NBEAP1
RN7SL759P
GOLGA6L6
snoU13|ENSG00000239083.1
CHEK2P2
RN7SL584P
ACTC1
ADAM10
ANXA2
APBA2
AQP9
BNIP2
CAPN3
CHRM5
CKMT1B
CYP19A1
DUT
EPB42
FBN1
GABPB1
GABRA5
GABRB3
GABRG3
GALK2
GANC
GATM
GCHFR
PDIA3
GTF2A2
HDC
ITPKA
IVD
LIPC
LTK
MAP1A
MEIS2
MFAP1
TRPM1
MYO1E
MYO5A
NDN
NEDD4
OAZ2
OCA2
PLCB2
PPIB
MAPK6
RAB27A
RAD51
RORA
RYR3
SCG5
SLC12A1
SNRPN
SNX1
SPINT1
SRP14
TJP1
TP53BP1
TPM1
TYRO3
UBE3A
MKRN3
CILP
EIF3J
RAB11A
SNAP23
ALDH1A2
HERC2
HERC1
USP8
CCNB2
SLC28A2
SLC24A1
CCPG1
GCNT3
COPS2
TRIP4
TGM5
PIGB
IGDCC3
PPIP5K1
AQR
SECISBP2L
KIAA0101
ARHGAP11A
LCMT2
BCL2L10
PDCD7
RASGRP1
DENND4A
GNB5
ARPP19
SLC27A2
GPR176
CHP1
OIP5
BAHD1
FAN1
CEP152
MAPKBP1
GOLGA8A
ZNF609
RTF1
CYFIP1
MGA
DMXL2
VPS39
FAM189A1
AP4E1
DAPK2
EID1
NPAP1
SERF2
TMEM87A
RPAP1
BLOC1S6
GREM1
FOXB1
RPUSD2
TUBGCP4
SCG3
TMOD3
TMOD2
EHD4
DUOX2
MYEF2
RPS27L
NDUFAF1
NUSAP1
EMC4
RASL12
SPG21
SPTBN5
PTPLAD1
CTDSPL2
KLF13
RAB8B
DUOX1
CSNK1G1
DLL4
INO80
FAM63B
RNF111
ZNF280D
TRPM7
VPS13C
PPP1R14D
DPP8
MTMR10
PARP16
ZNF770
HAUS2
RMDN3
DNAJC17
MNS1
NOP10
MYO5C
NDNL2
FAM214A
EMC7
PAK6
DTWD1
CASC5
AVEN
STARD9
VPS18
IGDCC4
SQRDL
ZNF106
RFX7
SPATA5L1
CHAC1
NARG2
KATNBL1
SLTM
SNX22
ATP8B4
WDR76
TMEM62
SEMA6D
PIF1
SPG11
ELL3
PLEKHO2
POLR2M
VWA9
NIPA2
APH1B
TLN2
FAM96A
C15orf48
MEGF11
C15orf41
SPPL2A
ZFYVE19
CGNL1
FRMD5
DISP2
CHRFAM7A
ARHGAP11B
DPH6
C15orf57
KNSTRN
BMF
SHF
DUOXA1
LDHAL6B
CHST14
CASC4
LACTB
TUBGCP5
TGM7
CATSPER2
LEO1
SLC51B
NIPA1
PLA2G4E
TRIM69
C15orf43
C2CD4A
FAM81A
C15orf65
TMCO5A
ZSCAN29
TTBK2
CDAN1
STRC
DYX1C1
OTUD7A
SPRED1
PGBD4
ADAL
EXD1
FSIP1
RHOV
FAM227B
UBR1
PATL2
LPCAT4
PLA2G4F
LRRC57
LYSMD2
NUTM1
WDR72
SLC24A5
PRTG
GOLGA6L2
PLA2G4D
HERC2P3
GOLGA6L1
GOLGA8G
GOLGA8I
FBXL22
GLDN
FMN1
RBPMS2
ANKDD1A
USP50
TEX9
C15orf52
TNFAIP8L3
C2CD4B
GOLGA8EP
OR4M2
KBTBD13
UBAP1L
SHC4
CTXN2
C15orf53
C15orf54
DUOXA2
HERC2P9
GOLGA8B
EIF2AK4
UNC13C
MIR422A
CKMT1A
SERINC4
C15orf62
GOLGA8N
C15orf56
PHGR1
MIR147B
ANKRD63
JMJD7
PLA2G4B
POTEB2
MIR1282
MIR4311
MIR4508
MIR4510
MIR4716
MIR4713
MIR4712
GCOM1
POTEB

Figure 2.  Genomic positions of deleted regions: the X-axis represents the normalized deletion signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Arm-level results

Table 2.  Get Full Table Arm-level significance table - 14 significant results found. The significance cutoff is at Q value=0.25.

Arm # Genes Amp Frequency Amp Z score Amp Q value Del Frequency Del Z score Del Q value
1p 1300 0.08 -1.84 0.995 0.82 8.73 0
1q 1195 0.07 -1.9 0.995 0.80 8.64 0
2p 624 0.10 -1.69 0.995 0.72 8.16 1.01e-15
2q 967 0.10 -1.88 0.995 0.72 7.54 9.33e-14
3p 644 0.12 -2.43 0.995 0.17 -1.72 1
3q 733 0.14 -2.33 0.995 0.14 -2.33 1
4p 289 0.37 2.34 0.292 0.02 -3.33 1
4q 670 0.37 1.79 0.449 0.02 -3.6 1
5p 183 0.14 -1.66 0.995 0.17 -1.16 1
5q 905 0.14 -2.38 0.995 0.17 -1.92 1
6p 710 0.07 -1.79 0.995 0.78 9.25 0
6q 556 0.07 -1.73 0.995 0.78 9.56 0
7p 389 0.37 2.19 0.292 0.02 -3.4 1
7q 783 0.37 1.64 0.449 0.02 -3.68 1
8p 338 0.29 0.655 0.69 0.20 -0.785 1
8q 551 0.30 0.616 0.69 0.18 -1.25 1
9p 301 0.16 -1.41 0.995 0.19 -0.92 1
9q 700 0.16 -1.81 0.995 0.19 -1.35 1
10p 253 0.18 -0.632 0.995 0.78 9.92 0
10q 738 0.12 -1.45 0.995 0.77 8.79 0
11p 509 0.25 -0.0927 0.995 0.14 -1.99 1
11q 975 0.25 -0.624 0.995 0.14 -2.42 1
12p 339 0.30 0.905 0.667 0.04 -3.26 1
12q 904 0.31 0.381 0.801 0.02 -3.97 1
13q 560 0.09 -1.92 0.995 0.67 7.43 2.02e-13
14q 938 0.33 0.697 0.69 0.04 -3.61 1
15q 810 0.33 0.858 0.667 0.04 -3.52 1
16p 559 0.34 1.36 0.449 0.09 -2.65 1
16q 455 0.34 1.58 0.449 0.11 -2.23 1
17p 415 0.00 -2.33 0.995 0.76 9.42 0
17q 972 0.00 -2.56 0.995 0.76 8.35 0
18p 104 0.29 1.1 0.555 0.16 -1.14 1
18q 275 0.29 0.733 0.69 0.20 -0.719 1
19p 681 0.29 0.383 0.801 0.02 -3.85 1
19q 935 0.27 -0.326 0.995 0.06 -3.51 1
20p 234 0.32 1.5 0.449 0.09 -2.44 1
20q 448 0.33 1.42 0.449 0.07 -2.9 1
21q 258 0.13 -1.47 0.995 0.56 5.92 4.83e-09
22q 564 0.33 1.11 0.555 0.17 -1.45 1
Xp 418 0.22 -0.372 0.995 0.65 7.06 2.76e-12
Xq 668 0.22 -0.563 0.995 0.65 6.62 5.58e-11
Methods & Data
Input
Description
  • Segmentation File: The segmentation file contains the segmented data for all the samples identified by GLAD, CBS, or some other segmentation algorithm. (See GLAD file format in the Genepattern file formats documentation.) It is a six column, tab-delimited file with an optional first line identifying the columns. Positions are in base pair units.The column headers are: (1) Sample (sample name), (2) Chromosome (chromosome number), (3) Start Position (segment start position, in bases), (4) End Position (segment end position, in bases), (5) Num markers (number of markers in segment), (6) Seg.CN (log2() -1 of copy number).

  • Markers File: The markers file identifies the marker names and positions of the markers in the original dataset (before segmentation). It is a three column, tab-delimited file with an optional header. The column headers are: (1) Marker Name, (2) Chromosome, (3) Marker Position (in bases).

  • Reference Genome: The reference genome file contains information about the location of genes and cytobands on a given build of the genome. Reference genome files are created in Matlab and are not viewable with a text editor.

  • CNV Files: There are two options for the cnv file. The first option allows CNVs to be identified by marker name. The second option allows the CNVs to be identified by genomic location. Option #1: A two column, tab-delimited file with an optional header row. The marker names given in this file must match the marker names given in the markers file. The CNV identifiers are for user use and can be arbitrary. The column headers are: (1) Marker Name, (2) CNV Identifier. Option #2: A 6 column, tab-delimited file with an optional header row. The 'CNV Identifier' is for user use and can be arbitrary. 'Narrow Region Start' and 'Narrow Region End' are also not used. The column headers are: (1) CNV Identifier, (2) Chromosome, (3) Narrow Region Start, (4) Narrow Region End, (5) Wide Region Start, (6) Wide Region End

  • Amplification Threshold: Threshold for copy number amplifications. Regions with a log2 ratio above this value are considered amplified.

  • Deletion Threshold: Threshold for copy number deletions. Regions with a log2 ratio below the negative of this value are considered deletions.

  • Cap Values: Minimum and maximum cap values on analyzed data. Regions with a log2 ratio greater than the cap are set to the cap value; regions with a log2 ratio less than -cap value are set to -cap. Values must be positive.

  • Broad Length Cutoff: Threshold used to distinguish broad from focal events, given in units of fraction of chromosome arm.

  • Remove X-Chromosome: Flag indicating whether to remove data from the X-chromosome before analysis. Allowed values= {1,0} (1: Remove X-Chromosome, 0: Do not remove X-Chromosome.

  • Confidence Level: Confidence level used to calculate the region containing a driver.

  • Join Segment Size: Smallest number of markers to allow in segments from the segmented data. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number.

  • Arm Level Peel Off: Flag set to enable arm-level peel-off of events during peak definition. The arm-level peel-off enhancement to the arbitrated peel-off method assigns all events in the same chromosome arm of the same sample to a single peak. It is useful when peaks are split by noise or chromothripsis. Allowed values= {1,0} (1: Use arm level peel off, 0: Use normal arbitrated peel-off).

  • Maximum Sample Segments: Maximum number of segments allowed for a sample in the input data. Samples with more segments than this threshold are excluded from the analysis.

  • Gene GISTIC: When enabled (value = 1), this option causes GISTIC to analyze deletions using genes instead of array markers to locate the lesion. In this mode, the copy number assigned to a gene is the lowest copy number among the markers that represent the gene.

Values

List of inputs used for this run of GISTIC2. All files listed should be included in the archived results.

  • Segmentation File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/GDAC_MergeDataFilesPipeline/KICH-TP/14517846/GDAC_MergeDataFiles_12184691/KICH-TP.snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg19__seg.seg.txt

  • Markers File = /xchip/cga/reference/gistic2/genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt

  • Reference Genome = /xchip/cga/reference/gistic2/hg19_GENCODE_v18_20140127.mat

  • CNV Files = /xchip/cga/reference/gistic2/CNV.hg19.bypos.111213.txt

  • Amplification Threshold = 0.1

  • Deletion Threshold = 0.1

  • Cap Values = 1.5

  • Broad Length Cutoff = 0.7

  • Remove X-Chromosome = 0

  • Confidence Level = 0.99

  • Join Segment Size = 4

  • Arm Level Peel Off = 1

  • Maximum Sample Segments = 2000

  • Gene GISTIC = 1

Table 3.  Get Full Table First 10 out of 66 Input Tumor Samples.

Tumor Sample Names
TCGA-KL-8323-01A-21D-2308-01
TCGA-KL-8324-01A-11D-2308-01
TCGA-KL-8325-01A-11D-2308-01
TCGA-KL-8326-01A-11D-2308-01
TCGA-KL-8327-01A-11D-2308-01
TCGA-KL-8328-01A-11D-2308-01
TCGA-KL-8329-01A-11D-2308-01
TCGA-KL-8330-01A-11D-2308-01
TCGA-KL-8331-01A-11D-2308-01
TCGA-KL-8332-01A-11D-2308-01

Figure 3.  Segmented copy number profiles in the input data

Output
All Lesions File (all_lesions.conf_##.txt, where ## is the confidence level)

The all lesions file summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row, starting in column 10.

Region Data

Columns 1-9 present the data about the significant regions as follows:

  1. Unique Name: A name assigned to identify the region.

  2. Descriptor: The genomic descriptor of that region.

  3. Wide Peak Limits: The 'wide peak' boundaries most likely to contain the targeted genes. These are listed in genomic coordinates and marker (or probe) indices.

  4. Peak Limits: The boundaries of the region of maximal amplification or deletion.

  5. Region Limits: The boundaries of the entire significant region of amplification or deletion.

  6. Q values: The Q value of the peak region.

  7. Residual Q values: The Q value of the peak region after removing ('peeling off') amplifications or deletions that overlap other, more significant peak regions in the same chromosome.

  8. Broad or Focal: Identifies whether the region reaches significance due primarily to broad events (called 'broad'), focal events (called 'focal'), or independently significant broad and focal events (called 'both').

  9. Amplitude Threshold: Key giving the meaning of values in the subsequent columns associated with each sample.

Sample Data

Each of the analyzed samples is represented in one of the columns following the lesion data (columns 10 through end). The data contained in these columns varies slightly by section of the file. The first section can be identified by the key given in column 9 - it starts in row 2 and continues until the row that reads 'Actual Copy Change Given.' This section contains summarized data for each sample. A '0' indicates that the copy number of the sample was not amplified or deleted beyond the threshold amount in that peak region. A '1' indicates that the sample had low-level copy number aberrations (exceeding the low threshold indicated in column 9), and a '2' indicates that the sample had high-level copy number aberrations (exceeding the high threshold indicated in column 9).The second section can be identified the rows in which column 9 reads 'Actual Copy Change Given.' The second section exactly reproduces the first section, except that here the actual changes in copy number are provided rather than zeroes, ones, and twos.The final section is similar to the first section, except that here only broad events are included. A 1 in the samples columns (columns 10+) indicates that the median copy number of the sample across the entire significant region exceeded the threshold given in column 9. That is, it indicates whether the sample had a geographically extended event, rather than a focal amplification or deletion covering little more than the peak region.

Amplification Genes File (amp_genes.conf_##.txt, where ## is the confidence level)

The amp genes file contains one column for each amplification peak identified in the GISTIC analysis. The first four rows are:

  1. Cytoband

  2. Q value

  3. Residual Q value

  4. Wide Peak Boundaries

These rows identify the lesion in the same way as the all lesions file.The remaining rows list the genes contained in each wide peak. For peaks that contain no genes, the nearest gene is listed in brackets.

Deletion Genes File (del_genes.conf_##.txt, where ## is the confidence level)

The del genes file contains one column for each deletion peak identified in the GISTIC analysis. The file format for the del genes file is identical to the format for the amp genes file.

Gistic Scores File (scores.gistic)

The scores file lists the Q values [presented as -log10(q)], G scores, average amplitudes among aberrant samples, and frequency of aberration, across the genome for both amplifications and deletions. The scores file is viewable with the Genepattern SNPViewer module and may be imported into the Integrated Genomics Viewer (IGV).

Segmented Copy Number (raw_copy_number.{fig|pdf|png} )

The segmented copy number is a pdf file containing a colormap image of the segmented copy number profiles in the input data.

Amplification Score GISTIC plot (amp_qplot.{fig|pdf|png|v2.pdf})

The amplification pdf is a plot of the G scores (top) and Q values (bottom) with respect to amplifications for all markers over the entire region analyzed.

Deletion Score GISTIC plot (del_qplot.{fig|pdf|png|v2.pdf})

The deletion pdf is a plot of the G scores (top) and Q values (bottom) with respect to deletions for all markers over the entire region analyzed.

Tables (table_{amp|del}.conf_##.txt, where ## is the confidence level)

Tables of basic information about the genomic regions (peaks) that GISTIC determined to be significantly amplified or deleted. These describe three kinds of peak boundaries, and list the genes contained in two of them. The region start and region end columns (along with the chromosome column) delimit the entire area containing the peak that is above the significance level. The region may be the same for multiple peaks. The peak start and end delimit the maximum value of the peak. The extended peak is the peak determined by robust, and is contained within the wide peak reported in {amp|del}_genes.txt by one marker.

Broad Significance Results (broad_significance_results.txt)

A table of per-arm statistical results for the data set. Each arm is a row in the table. The first column specifies the arm and the second column counts the number of genes known to be on the arm. For both amplification and deletion, the table has columns for the frequency of amplification or deletion of the arm, and a Z score and Q value.

Broad Values By Arm (broad_values_by_arm.txt)

A table of chromosome arm amplification levels for each sample. Each row is a chromosome arm, and each column a sample. The data are in units of absolute copy number -2.

All Data By Genes (all_data_by_genes.txt)

A gene-level table of copy number values for all samples. Each row is the data for a gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. The copy number values in the table are in units of (copy number -2), so that no amplification or deletion is 0, genes with amplifications have positive values, and genes with deletions are negative values. The data are converted from marker level to gene level using the extreme method: a gene is assigned the greatest amplification or the least deletion value among the markers it covers.

Broad Data By Genes (broad_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only broad events with lengths greater than the broad length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

Focal Data By Genes (focal_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only focal events with lengths greater than the focal length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

All Thresholded By Genes (all_thresholded.by_genes.txt)

A gene-level table of discrete amplification and deletion indicators at for all samples. There is a row for each gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. A table value of 0 means no amplification or deletion above the threshold. Amplifications are positive numbers: 1 means amplification above the amplification threshold; 2 means amplifications larger to the arm level amplifications observed for the sample. Deletions are represented by negative table values: -1 represents deletion beyond the threshold; -2 means deletions greater than the minimum arm-level deletion observed for the sample.

Sample Cutoffs (sample_cutoffs.txt)

A table of the per-sample threshold cutoffs (in units of absolute copy number -2) used to distinguish the high level amplifications (+/-2) from ordinary amplifications (+/-1) in the all_thresholded.by_genes.txt output file. The table contains three columns: the sample identifier followed by the low (deletion) and high (amplification) cutoff values. The cutoffs are calculated as the minimum arm-level amplification level less the deletion threshold for deletions and the maximum arm-level amplification plus the amplification threshold for amplifications.

Focal Input To Gistic (focal_input.seg.txt)

A list of copy number segments describing just the focal events present in the data. The segment amplification/deletion levels are in units of (copy number -2), with amplifications positive and deletions negative numbers. This file may be viewed with IGV.

Gene Counts vs. Copy Number Alteration Frequency (freqarms_vs_ngenes.{fig|pdf})

An image showing the correlation between gene counts and frequency of copy number alterations.

Confidence Intervals (regions_track.conf_##.bed, where ## is the confidence level)

A file indicating the position of the confidence intervals around GISTIC peaks that can be loaded as a track in a compatible viewer browser such as IGV or the UCSC genome browser.

GISTIC

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. It takes segmented copy number ratios as input, separates arm-level events from focal events, and then performs two tests: (i) identifies significantly amplified/deleted chromosome arms; and (ii) identifies regions that are significantly focally amplified or deleted. For the focal analysis, the significance levels (Q values) are calculated by comparing the observed gains/losses at each locus to those obtained by randomly permuting the events along the genome to reflect the null hypothesis that they are all 'passengers' and could have occurred anywhere. The locus-specific significance levels are then corrected for multiple hypothesis testing. The arm-level significance is calculated by comparing the frequency of gains/losses of each arm to the expected rate given its size. The method outputs genomic views of significantly amplified and deleted regions, as well as a table of genes with gain or loss scores. A more in depth discussion of the GISTIC algorithm and its utility is given in [1], [3], and [5].

CNV Description

Regions of the genome that are prone to germ line variations in copy number are excluded from the GISTIC analysis using a list of germ line copy number variations (CNVs). A CNV is a DNA sequence that may be found at different copy numbers in the germ line of two different individuals. Such germ line variations can confound a GISTIC analysis, which finds significant somatic copy number variations in cancer. A more in depth discussion is provided in [6]. GISTIC currently uses two CNV exclusion lists. One is based on the literature describing copy number variation, and a second one comes from an analysis of significant variations among the blood normals in the TCGA data set.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References
[1] Beroukhim et al, Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma, Proc Natl Acad Sci U S A. Vol. 104:50 (2007)
[3] Mermel et al, GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers, Genome Biology Vol. 12:4 (2011)
[5] Beroukhim et al., The landscape of somatic copy-number alteration across human cancers, Nature Vol. 463:7283 (2010)
[6] McCarroll, S. A. et al., Integrated detection and population-genetic analysis of SNPs and copy number variation, Nat Genet Vol. 40(10):1166-1174 (2008)