LowPass Copy number analysis (GISTIC2)

Esophageal Carcinoma (Primary solid tumor)

17 October 2014 | analyses__2014_10_17

Maintainer Information

Citation Information

Maintained by Spring Yingchun Liu (Broad Institute)

Cite as Broad Institute TCGA Genome Data Analysis Center (2014): LowPass Copy number analysis (GISTIC2). Broad Institute of MIT and Harvard. doi:10.7908/C13T9G2C

Overview

Introduction

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. The pipeline first filters out normal samples from the segmented copy-number data by inspecting the TCGA barcodes and then executes GISTIC version 2.0.21 (Firehose task version: 127).

Summary

There were 32 tumor samples used in this analysis: 14 significant arm-level results, 1 significant focal amplifications, and 6 significant focal deletions were found.

Results

Focal results

Figure 1. Genomic positions of amplified regions: the X-axis represents the normalized amplification signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 1. Get Full Table Amplifications Table - 1 significant amplifications found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband	Q value	Residual Q value	Wide Peak Boundaries	# Genes in Wide Peak
11q13.3	1.5678e-20	1.5678e-20	chr11:69467999-70286916	10

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 11q13.3.

Table S1. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CCND1
MIR548K
FGF3
CTTN
FGF4
PPFIA1
FADD
FGF19
ANO1
ORAOV1

Figure 2. Genomic positions of deleted regions: the X-axis represents the normalized deletion signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 2. Get Full Table Deletions Table - 6 significant deletions found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband	Q value	Residual Q value	Wide Peak Boundaries	# Genes in Wide Peak
9p21.3	3.4142e-22	3.4142e-22	chr9:21959090-22010600	3
16q23.1	0.0010212	0.0010212	chr16:78589811-78992298	1
2q22.2	0.015854	0.015854	chr2:142380819-142702424	1
3p11.1	0.025299	0.025299	chr3:74312399-94258261	37
7q36.1	0.041608	0.041608	chr7:141204815-159138663	185
6p25.3	0.20887	0.20887	chr6:1-28732470	249

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 9p21.3.

Table S2. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CDKN2A
CDKN2B
C9orf53

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 16q23.1.

Table S3. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
WWOX

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 2q22.2.

Table S4. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
LRP1B

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 3p11.1.

Table S5. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
NSUN3
U3\|ENSG00000212598.1
HTR1F
RNU6ATAC6P
RN7SKP284
snoZ40
MIR5688
LINC00971
RN7SL751P
RN7SKP61
RN7SL647P
RN7SL92P
ZNF717
LINC00960
FAM86DP
RN7SL294P
EPHA3
GBE1
CNTN3
POU1F1
PROS1
ROBO1
ROBO2
CGGBP1
CHMP2B
ZNF654
ARL13B
DHFRL1
CADM2
C3orf38
VGLL3
STX19
FRG2C
MIR1324
MIR4273
MIR3923
MIR4795

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 7q36.1.

Table S6. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
EZH2
LINC00689
MIR595
RN7SL142P
SHH
RN7SKP280
HTR5A
RN7SL845P
RN7SL811P
SNORA26\|ENSG00000212590.1
snoU13\|ENSG00000238557.1
RNA5SP250
FABP5P3
snoU13\|ENSG00000239045.1
RN7SL76P
MIR3907
MIR671
IQCA1P1
CDK5
SSPO
SNORD112\|ENSG00000252557.1
RN7SL521P
RNY1
RNY3
RNY4
RNY5
RN7SL569P
RN7SL72P
U3\|ENSG00000199370.1
RN7SL456P
RNA5SP249
RN7SL207P
RN7SKP174
RNU6ATAC40P
OR2A9P
OR2A20P
CTAGE15
RN7SL481P
RN7SL535P
OR6W1P
TRBV30
TRBC2
PRSS3P2
TRBV28
TRBV27
TRBV19
TRBV9
TRBV2
PRSS3P3
MOXD2P
OR9A1P
OR9A3P
AGK
AOC1
CASP2
CLCN1
DPP6
EN2
EPHA1
EPHB6
GBX1
MNX1
INSIG1
KCNH2
KEL
NOS3
PIP
PRSS1
TAS2R38
PTPRN2
RARRES2
RHEB
SLC4A2
SMARCD3
SSBP1
VIPR2
XRCC2
ZYX
ARHGEF5
ZNF212
ZNF282
CUL1
MGAM
ASIC3
PDIA4
UBE3C
FAM131B
FAM115A
DNAJB6
ABCF2
FASTK
ABCB8
PAXIP1
CLEC5A
CNTNAP2
GIMAP2
OR2F1
TPK1
ZNF777
TMEM176B
REPIN1
TAS2R3
TAS2R4
PRKAG2
NUB1
TAS2R5
CHPF2
NCAPG2
WDR60
GIMAP4
GIMAP5
TMEM176A
TRPV6
TRPV5
ACTR3B
KIAA1147
ESYT2
ZNF398
KMT2C
GALNT11
LMBR1
NOM1
LRRC61
ZNF767
TMUB1
KRBA1
ZBED6CL
AGAP3
C7orf13
OR9A4
OR9A2
C7orf34
TMEM139
NOBOX
OR2A14
OR6B1
OR2F2
ZNF786
PRSS37
ASB10
PRSS58
RNF32
GIMAP8
CRYGN
ZNF425
ZNF746
ATP6V0E2
RBM33
GALNTL5
GIMAP7
ZNF467
GIMAP1
C7orf33
TAS2R39
TAS2R40
TAS2R41
CNPY1
FAM115C
ZNF775
ATG9B
TAS2R60
BLACE
CTAGE6
OR6V1
OR2A12
OR2A1
WDR86
GSTK1
OR2A25
OR2A5
OR2A7
OR2A42
CTAGE15
OR2A2
ARHGEF35
GIMAP6
WEE2
ZNF862
ACTR3C
CTAGE4
CTAGE8
ZNF783
MIR548F4
MTRNR2L6
MIR5707

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 6p25.3.

Table S7. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
HIST1H4I
IRF4
DEK
LINC00533
ZSCAN12
ZNF192P1
ZSCAN12P1
U3\|ENSG00000199851.1
HIST1H2AJ
TRNAI6
HIST1H2AG
HIST1H2BJ
TRNAI2
LINC00240
GUSBP2
ZNF322
HMGN4
HCG11
HIST1H4G
HIST1H4F
HIST1H4E
HIST1H2BF
HIST1H2AD
HIST1H2BC
HIST1H4C
snoU13\|ENSG00000238322.1
RNY5P5
CMAHP
RN7SL334P
C6orf62
SNORD46\|ENSG00000251830.1
RN7SKP240
LINC00340
RN7SL128P
MBOAT1
ID4
RNA5SP205
snoU13\|ENSG00000238458.1
RNA5SP204
STMND1
U3\|ENSG00000251793.1
MIR4639
RN7SL332P
NOL7
RN7SKP204
PHACTR1
RN7SKP293
SNORA67\|ENSG00000207419.1
snoU13\|ENSG00000238896.1
RNA5SP203
C6orf52
GCNT6
LINC00518
RNU6ATAC21P
OFCC1
HULC
snoU13\|ENSG00000251762.1
RN7SL554P
LY86
RN7SL221P
MIR3691
RMRPP2
snoU13\|ENSG00000238801.1
RNA5SP202
snoU13\|ENSG00000252668.1
C6ORF50
RNA5SP201
C6orf195
RN7SL352P
snoU13\|ENSG00000238438.1
BMP6
BPHL
BTN1A1
DSP
E2F3
EDN1
SERPINB1
F13A1
FOXF2
FOXC1
GCNT2
GMDS
GMPR
GPLD1
GPX5
HIST1H1C
HIST1H1D
HIST1H1E
HIST1H1B
HIST1H1T
HIST1H2AE
HIST1H2BD
HIST1H2BB
HIST1H1A
HFE
HIVEP1
JARID2
MAK
NEDD9
NQO2
SERPINB6
SERPINB9
PRL
RREB1
ATXN1
SLC17A1
SOX4
SSR1
TFAP2A
TPMT
TUBB2A
ZNF165
ZNF184
ZKSCAN8
ZSCAN9
ALDH5A1
HIST1H2AI
HIST1H2AK
HIST1H2AL
HIST1H2AC
HIST1H2AB
HIST1H2AM
HIST1H2BG
HIST1H2BL
HIST1H2BN
HIST1H2BM
HIST1H2BE
HIST1H2BH
HIST1H2BI
HIST1H2BO
HIST1H3A
HIST1H3D
HIST1H3C
HIST1H3E
HIST1H3I
HIST1H3G
HIST1H3J
HIST1H3H
HIST1H3B
HIST1H4A
HIST1H4D
HIST1H4K
HIST1H4J
HIST1H4H
HIST1H4B
HIST1H4L
RIPK1
PRPF4B
HIST1H3F
GCM2
CD83
CDYL
EEF1E1
FAM65B
KIAA0319
NUP153
RANBP9
SLC17A4
SLC17A2
PRSS16
BTN3A3
BTN2A2
ECI2
TRIM38
CAP2
SCGN
FARS2
SLC17A3
RPP40
BTN3A2
BTN3A1
BTN2A1
SIRT5
OR2B6
FAM50B
MYLIP
ABT1
SLC35B3
GMNN
TBC1D7
NRN1
FAM8A1
DCDC2
TMEM14C
TDP2
GFOD1
BTN2A3P
ELOVL2
CDKAL1
PAK1IP1
LRRC16A
EXOC2
ACOT13
WRNIP1
DUSP22
LYRM4
MRS2
SLC22A23
BLOC1S5
MCUR1
KIF13A
ZSCAN31
ZKSCAN3
ZSCAN16
TXNDC5
OR2B2
TMEM14B
RIOK1
DTNBP1
PGBD1
ADTRP
HIST1H2AH
HIST1H2BK
POM121L2
FOXQ1
SCAND3
HUS1B
NRSN1
SNRNP48
HDGFL1
HIST1H2AA
KDM1B
RBM24
RNF182
SMIM13
SYCP2L
PXDC1
ZSCAN23
NKAPL
FAM217A
RNF144B
HIST1H2BA
GPX6
CAGE1
MYLK4
ZNF391
TUBB2B
KAAG1
NHLRC1
ZKSCAN4
PSMG4
C6orf201
PPP1R3G
MIR548A1
TMEM170B
MIR3143
MIR4645
MIR5689
MIR5683

Arm-level results

Table 3. Get Full Table Arm-level significance table - 14 significant results found. The significance cutoff is at Q value=0.25.

Arm	# Genes	Amp Frequency	Amp Z score	Amp Q value	Del Frequency	Del Z score	Del Q value
1p	1300	0.04	-1.23	0.998	0.23	1.82	0.17
1q	1195	0.31	2.72	0.0259	0.13	-0.0872	0.853
2p	624	0.16	-0.908	0.998	0.04	-2.38	0.995
2q	967	0.13	-0.659	0.998	0.04	-1.93	0.995
3p	644	0.00	-1.8	0.998	0.66	5.8	1.29e-07
3q	733	0.52	3.77	0.00111	0.37	1.67	0.212
4p	289	0.14	-1.44	0.998	0.38	1.13	0.431
4q	670	0.04	-2.06	0.998	0.29	0.909	0.519
5p	183	0.29	-0.109	0.998	0.32	0.197	0.853
5q	905	0.00	-2.01	0.998	0.44	3.7	0.00215
6p	710	0.04	-2.2	0.998	0.19	-0.302	0.853
6q	556	0.00	-2.82	0.998	0.22	-0.294	0.853
7p	389	0.60	4.1	0.000412	0.14	-1.06	0.95
7q	783	0.45	3.27	0.00547	0.16	-0.498	0.881
8p	338	0.26	-0.169	0.998	0.35	0.793	0.535
8q	551	0.63	5.01	1.1e-05	0.15	-0.741	0.881
9p	301	0.16	-1.21	0.998	0.45	1.98	0.159
9q	700	0.19	-0.311	0.998	0.22	0.0594	0.853
10p	253	0.17	-1.34	0.998	0.29	-0.0557	0.853
10q	738	0.04	-2.1	0.998	0.23	0.198	0.853
11p	509	0.05	-2.21	0.998	0.32	0.952	0.519
11q	975	0.09	-1.09	0.998	0.30	1.85	0.17
12p	339	0.42	1.67	0.27	0.29	0.095	0.853
12q	904	0.12	-0.905	0.998	0.21	0.315	0.853
13q	560	0.21	-0.304	0.998	0.46	2.77	0.0278
14q	938	0.25	0.986	0.649	0.16	-0.243	0.853
15q	810	0.07	-1.76	0.998	0.10	-1.36	0.988
16p	559	0.08	-1.88	0.998	0.23	-0.0922	0.853
16q	455	0.12	-1.57	0.998	0.24	-0.199	0.853
17p	415	0.20	-0.726	0.998	0.26	-0.0563	0.853
17q	972	0.28	1.46	0.359	0.12	-0.623	0.881
18p	104	0.20	-1.11	0.998	0.43	1.3	0.354
18q	275	0.11	-1.65	0.998	0.47	2.18	0.118
19p	681	0.07	-1.84	0.998	0.17	-0.712	0.881
19q	935	0.14	-0.501	0.998	0.14	-0.501	0.881
20p	234	0.37	0.876	0.693	0.23	-0.683	0.881
20q	448	0.41	1.91	0.189	0.00	-2.57	0.995
21q	258	0.20	-0.766	0.998	0.59	3.52	0.00287
22q	564	0.25	0.118	0.998	0.31	0.812	0.535
Xq	668	0.33	1.3	0.428	0.33	1.3	0.354

Methods & Data

Input

Description

Segmentation File: The segmentation file contains the segmented data for all the samples identified by GLAD, CBS, or some other segmentation algorithm. (See GLAD file format in the Genepattern file formats documentation.) It is a six column, tab-delimited file with an optional first line identifying the columns. Positions are in base pair units.The column headers are: (1) Sample (sample name), (2) Chromosome (chromosome number), (3) Start Position (segment start position, in bases), (4) End Position (segment end position, in bases), (5) Num markers (number of markers in segment), (6) Seg.CN (log2() -1 of copy number).
Markers File: The markers file identifies the marker names and positions of the markers in the original dataset (before segmentation). It is a three column, tab-delimited file with an optional header. The column headers are: (1) Marker Name, (2) Chromosome, (3) Marker Position (in bases).
Reference Genome: The reference genome file contains information about the location of genes and cytobands on a given build of the genome. Reference genome files are created in Matlab and are not viewable with a text editor.
CNV Files: There are two options for the cnv file. The first option allows CNVs to be identified by marker name. The second option allows the CNVs to be identified by genomic location. Option #1: A two column, tab-delimited file with an optional header row. The marker names given in this file must match the marker names given in the markers file. The CNV identifiers are for user use and can be arbitrary. The column headers are: (1) Marker Name, (2) CNV Identifier. Option #2: A 6 column, tab-delimited file with an optional header row. The 'CNV Identifier' is for user use and can be arbitrary. 'Narrow Region Start' and 'Narrow Region End' are also not used. The column headers are: (1) CNV Identifier, (2) Chromosome, (3) Narrow Region Start, (4) Narrow Region End, (5) Wide Region Start, (6) Wide Region End
Amplification Threshold: Threshold for copy number amplifications. Regions with a log2 ratio above this value are considered amplified.
Deletion Threshold: Threshold for copy number deletions. Regions with a log2 ratio below the negative of this value are considered deletions.
Cap Values: Minimum and maximum cap values on analyzed data. Regions with a log2 ratio greater than the cap are set to the cap value; regions with a log2 ratio less than -cap value are set to -cap. Values must be positive.
Broad Length Cutoff: Threshold used to distinguish broad from focal events, given in units of fraction of chromosome arm.
Remove X-Chromosome: Flag indicating whether to remove data from the X-chromosome before analysis. Allowed values= {1,0} (1: Remove X-Chromosome, 0: Do not remove X-Chromosome.
Confidence Level: Confidence level used to calculate the region containing a driver.
Join Segment Size: Smallest number of markers to allow in segments from the segmented data. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number.
Arm Level Peel Off: Flag set to enable arm-level peel-off of events during peak definition. The arm-level peel-off enhancement to the arbitrated peel-off method assigns all events in the same chromosome arm of the same sample to a single peak. It is useful when peaks are split by noise or chromothripsis. Allowed values= {1,0} (1: Use arm level peel off, 0: Use normal arbitrated peel-off).
Maximum Sample Segments: Maximum number of segments allowed for a sample in the input data. Samples with more segments than this threshold are excluded from the analysis.
Gene GISTIC: When enabled (value = 1), this option causes GISTIC to analyze deletions using genes instead of array markers to locate the lesion. In this mode, the copy number assigned to a gene is the lowest copy number among the markers that represent the gene.

Values

List of inputs used for this run of GISTIC2. All files listed should be included in the archived results.

Segmentation File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/PrepareGisticDNASeq/ESCA-TP/11541861/segmentationfile.txt
Markers File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/PrepareGisticDNASeq/ESCA-TP/11541861/markersfile.txt
Reference Genome = /xchip/cga/reference/gistic2/hg19_GENCODE_v18_20140127.mat
CNV Files = /xchip/cga/reference/gistic2/CNV.hg19.bypos.111213.txt
Amplification Threshold = 0.3
Deletion Threshold = 0.3
Cap Values = 2
Broad Length Cutoff = 0.5
Remove X-Chromosome = 0
Confidence Level = 0.99
Join Segment Size = 10
Arm Level Peel Off = 1
Maximum Sample Segments = 10000
Gene GISTIC = 0

Table 4. Get Full Table First 10 out of 32 Input Tumor Samples.

Tumor Sample Names
TCGA-IG-A3I8-01A-11D-A248-26
TCGA-IG-A3QL-01A-11D-A248-26
TCGA-IG-A3Y9-01A-12D-A248-26
TCGA-IG-A3YA-01A-11D-A248-26
TCGA-IG-A3YB-01A-11D-A248-26
TCGA-IG-A3YC-01A-11D-A248-26
TCGA-IG-A51D-01A-11D-A267-26
TCGA-L5-A43C-01A-11D-A248-26
TCGA-L5-A43E-01A-11D-A248-26
TCGA-L5-A43H-01A-11D-A248-26

Figure 3. Segmented copy number profiles in the input data

Output

All Lesions File (all_lesions.conf_##.txt, where ## is the confidence level)

The all lesions file summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row, starting in column 10.

Region Data

Columns 1-9 present the data about the significant regions as follows:

Unique Name: A name assigned to identify the region.
Descriptor: The genomic descriptor of that region.
Wide Peak Limits: The 'wide peak' boundaries most likely to contain the targeted genes. These are listed in genomic coordinates and marker (or probe) indices.
Peak Limits: The boundaries of the region of maximal amplification or deletion.
Region Limits: The boundaries of the entire significant region of amplification or deletion.
Q values: The Q value of the peak region.
Residual Q values: The Q value of the peak region after removing ('peeling off') amplifications or deletions that overlap other, more significant peak regions in the same chromosome.
Broad or Focal: Identifies whether the region reaches significance due primarily to broad events (called 'broad'), focal events (called 'focal'), or independently significant broad and focal events (called 'both').
Amplitude Threshold: Key giving the meaning of values in the subsequent columns associated with each sample.

Sample Data

Each of the analyzed samples is represented in one of the columns following the lesion data (columns 10 through end). The data contained in these columns varies slightly by section of the file. The first section can be identified by the key given in column 9 - it starts in row 2 and continues until the row that reads 'Actual Copy Change Given.' This section contains summarized data for each sample. A '0' indicates that the copy number of the sample was not amplified or deleted beyond the threshold amount in that peak region. A '1' indicates that the sample had low-level copy number aberrations (exceeding the low threshold indicated in column 9), and a '2' indicates that the sample had high-level copy number aberrations (exceeding the high threshold indicated in column 9).The second section can be identified the rows in which column 9 reads 'Actual Copy Change Given.' The second section exactly reproduces the first section, except that here the actual changes in copy number are provided rather than zeroes, ones, and twos.The final section is similar to the first section, except that here only broad events are included. A 1 in the samples columns (columns 10+) indicates that the median copy number of the sample across the entire significant region exceeded the threshold given in column 9. That is, it indicates whether the sample had a geographically extended event, rather than a focal amplification or deletion covering little more than the peak region.

Amplification Genes File (amp_genes.conf_##.txt, where ## is the confidence level)

The amp genes file contains one column for each amplification peak identified in the GISTIC analysis. The first four rows are:

Cytoband
Q value
Residual Q value
Wide Peak Boundaries

These rows identify the lesion in the same way as the all lesions file.The remaining rows list the genes contained in each wide peak. For peaks that contain no genes, the nearest gene is listed in brackets.

Deletion Genes File (del_genes.conf_##.txt, where ## is the confidence level)

The del genes file contains one column for each deletion peak identified in the GISTIC analysis. The file format for the del genes file is identical to the format for the amp genes file.

Gistic Scores File (scores.gistic)

The scores file lists the Q values [presented as -log10(q)], G scores, average amplitudes among aberrant samples, and frequency of aberration, across the genome for both amplifications and deletions. The scores file is viewable with the Genepattern SNPViewer module and may be imported into the Integrated Genomics Viewer (IGV).

Segmented Copy Number (raw_copy_number.{fig|pdf|png} )

The segmented copy number is a pdf file containing a colormap image of the segmented copy number profiles in the input data.

Amplification Score GISTIC plot (amp_qplot.{fig|pdf|png|v2.pdf})

The amplification pdf is a plot of the G scores (top) and Q values (bottom) with respect to amplifications for all markers over the entire region analyzed.

Deletion Score GISTIC plot (del_qplot.{fig|pdf|png|v2.pdf})

The deletion pdf is a plot of the G scores (top) and Q values (bottom) with respect to deletions for all markers over the entire region analyzed.

Tables (table_{amp|del}.conf_##.txt, where ## is the confidence level)

Tables of basic information about the genomic regions (peaks) that GISTIC determined to be significantly amplified or deleted. These describe three kinds of peak boundaries, and list the genes contained in two of them. The region start and region end columns (along with the chromosome column) delimit the entire area containing the peak that is above the significance level. The region may be the same for multiple peaks. The peak start and end delimit the maximum value of the peak. The extended peak is the peak determined by robust, and is contained within the wide peak reported in {amp|del}_genes.txt by one marker.

Broad Significance Results (broad_significance_results.txt)

A table of per-arm statistical results for the data set. Each arm is a row in the table. The first column specifies the arm and the second column counts the number of genes known to be on the arm. For both amplification and deletion, the table has columns for the frequency of amplification or deletion of the arm, and a Z score and Q value.

Broad Values By Arm (broad_values_by_arm.txt)

A table of chromosome arm amplification levels for each sample. Each row is a chromosome arm, and each column a sample. The data are in units of absolute copy number -2.

All Data By Genes (all_data_by_genes.txt)

A gene-level table of copy number values for all samples. Each row is the data for a gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. The copy number values in the table are in units of (copy number -2), so that no amplification or deletion is 0, genes with amplifications have positive values, and genes with deletions are negative values. The data are converted from marker level to gene level using the extreme method: a gene is assigned the greatest amplification or the least deletion value among the markers it covers.

Broad Data By Genes (broad_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only broad events with lengths greater than the broad length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

Focal Data By Genes (focal_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only focal events with lengths greater than the focal length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

All Thresholded By Genes (all_thresholded.by_genes.txt)

A gene-level table of discrete amplification and deletion indicators at for all samples. There is a row for each gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. A table value of 0 means no amplification or deletion above the threshold. Amplifications are positive numbers: 1 means amplification above the amplification threshold; 2 means amplifications larger to the arm level amplifications observed for the sample. Deletions are represented by negative table values: -1 represents deletion beyond the threshold; -2 means deletions greater than the minimum arm-level deletion observed for the sample.

Sample Cutoffs (sample_cutoffs.txt)

A table of the per-sample threshold cutoffs (in units of absolute copy number -2) used to distinguish the high level amplifications (+/-2) from ordinary amplifications (+/-1) in the all_thresholded.by_genes.txt output file. The table contains three columns: the sample identifier followed by the low (deletion) and high (amplification) cutoff values. The cutoffs are calculated as the minimum arm-level amplification level less the deletion threshold for deletions and the maximum arm-level amplification plus the amplification threshold for amplifications.

Focal Input To Gistic (focal_input.seg.txt)

A list of copy number segments describing just the focal events present in the data. The segment amplification/deletion levels are in units of (copy number -2), with amplifications positive and deletions negative numbers. This file may be viewed with IGV.

Gene Counts vs. Copy Number Alteration Frequency (freqarms_vs_ngenes.{fig|pdf})

An image showing the correlation between gene counts and frequency of copy number alterations.

Confidence Intervals (regions_track.conf_##.bed, where ## is the confidence level)

A file indicating the position of the confidence intervals around GISTIC peaks that can be loaded as a track in a compatible viewer browser such as IGV or the UCSC genome browser.

GISTIC

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. It takes segmented copy number ratios as input, separates arm-level events from focal events, and then performs two tests: (i) identifies significantly amplified/deleted chromosome arms; and (ii) identifies regions that are significantly focally amplified or deleted. For the focal analysis, the significance levels (Q values) are calculated by comparing the observed gains/losses at each locus to those obtained by randomly permuting the events along the genome to reflect the null hypothesis that they are all 'passengers' and could have occurred anywhere. The locus-specific significance levels are then corrected for multiple hypothesis testing. The arm-level significance is calculated by comparing the frequency of gains/losses of each arm to the expected rate given its size. The method outputs genomic views of significantly amplified and deleted regions, as well as a table of genes with gain or loss scores. A more in depth discussion of the GISTIC algorithm and its utility is given in [1], [3], and [5].

CNV Description

Regions of the genome that are prone to germ line variations in copy number are excluded from the GISTIC analysis using a list of germ line copy number variations (CNVs). A CNV is a DNA sequence that may be found at different copy numbers in the germ line of two different individuals. Such germ line variations can confound a GISTIC analysis, which finds significant somatic copy number variations in cancer. A more in depth discussion is provided in [6]. GISTIC currently uses two CNV exclusion lists. One is based on the literature describing copy number variation, and a second one comes from an analysis of significant variations among the blood normals in the TCGA data set.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References

[1] Beroukhim et al, Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma, Proc Natl Acad Sci U S A. Vol. 104:50 (2007)

[2] GISTIC version 1

[3] Mermel et al, GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers, Genome Biology Vol. 12:4 (2011)

[4] GISTIC version 2

[5] Beroukhim et al., The landscape of somatic copy-number alteration across human cancers, Nature Vol. 463:7283 (2010)

[6] McCarroll, S. A. et al., Integrated detection and population-genetic analysis of SNPs and copy number variation, Nat Genet Vol. 40(10):1166-1174 (2008)

[7] The Sanger Institute: Cancer Gene Census

Made with Nozzle