LowPass Copy number analysis (GISTIC2)

Stomach Adenocarcinoma (Primary solid tumor)

02 April 2015 | analyses__2015_04_02

Maintainer Information

Citation Information

Maintained by Spring Yingchun Liu (Broad Institute)

Cite as Broad Institute TCGA Genome Data Analysis Center (2015): LowPass Copy number analysis (GISTIC2). Broad Institute of MIT and Harvard. doi:10.7908/C1X929DD

Overview

Introduction

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. The pipeline first filters out normal samples from the segmented copy-number data by inspecting the TCGA barcodes and then executes GISTIC version 2.0.21 (Firehose task version: 127).

Summary

There were 107 tumor samples used in this analysis: 21 significant arm-level results, 17 significant focal amplifications, and 5 significant focal deletions were found.

Results

Focal results

Figure 1. Genomic positions of amplified regions: the X-axis represents the normalized amplification signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 1. Get Full Table Amplifications Table - 17 significant amplifications found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband	Q value	Residual Q value	Wide Peak Boundaries	# Genes in Wide Peak
19q12	6.1855e-10	6.1855e-10	chr19:29845736-30461073	6
12q15	3.6411e-06	3.6411e-06	chr12:68032120-70574042	21
8q24.21	4.5013e-06	4.5013e-06	chr8:128070146-128865434	6
12p12.1	0.00022798	0.00022798	chr12:24832434-25872659	8
8p23.1	0.0002302	0.0002302	chr8:10975654-12724285	31
20q13.2	0.00024738	0.00024738	chr20:51006785-53180923	8
7q21.2	0.00067742	0.00067742	chr7:90901353-97945867	58
7p11.2	0.0036851	0.0036851	chr7:54995267-55593325	3
18q11.2	0.0036851	0.0036851	chr18:19444890-20134873	6
10q26.13	0.0066236	0.0066236	chr10:122709888-123893564	5
17q12	0.0050612	0.0071589	chr17:35621565-39490730	94
6p21.1	0.013588	0.013588	chr6:43508397-44379101	23
1q42.3	0.033422	0.033422	chr1:234797952-235344760	6
13q22.1	0.06214	0.06214	chr13:73543082-74325003	6
15q26.1	0.096189	0.096189	chr15:87285740-102531392	116
11q13.3	0.11367	0.11367	chr11:67821083-70533128	26
17q12	0.17647	0.23511	chr17:31682027-36884102	87

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 19q12.

Table S1. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
C19orf12
POP4
CCNE1
URI1
PLEKHF1
VSTM2B

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 12q15.

Table S2. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
MDM2
MYRFL
RN7SL804P
SLC35E3
SNORA70G
DYRK2
CPM
IFNG
LYZ
RAP1B
YEATS4
CCT2
FRS2
CPSF6
IL22
IL26
MDM1
NUP107
RAB3IP
BEST3
LRRC10

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 8q24.21.

Table S3. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
MYC
POU5F1B
CASC8
CCAT1
PCAT2
PVT1

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 12p12.1.

Table S4. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
KRAS
RN7SL38P
BCAT1
LRMP
CASC1
LYRM5
IFLTD1
C12orf77

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 8p23.1.

Table S5. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
LINC00681
FAM86B2
FAM66A
RNA5SP254
FAM66D
RNA5SP253
DEFB130\|ENSG00000233050.1
DEFB134
C8orf49
LINC00208
RN7SL293P
C8orf12
LINC00529
BLK
CTSB
FDFT1
GATA4
MTMR9
FAM167A
SLC35G5
FAM86B1
LONRF1
TDH
DEFB130\|ENSG00000232948.1
NEIL2
XKR6
USP17L2
DEFB135
DEFB136
ZNF705D
MIR5692A2

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 20q13.2.

Table S6. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
RN7SKP184
CYP24A1
PFDN4
ZNF217
BCAS1
DOK5
TSHZ2
MIR4756

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 7q21.2.

Table S7. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CDK6
AKAP9
RN7SL13P
RN7SL478P
RN7SKP104
RN7SL252P
SHFM1
MIR591
PON1
snoU13\|ENSG00000238384.1
RN7SKP129
GNG11
MIR489
MIR653
RN7SL7P
GATAD1
ASNS
CALCR
KRIT1
COL1A2
CYP51A1
DLX5
DLX6
DYNC1I1
GNGT1
OCM2
PDK4
PEX1
PON2
PON3
TAC1
MTERF
TFPI2
SGCE
SLC25A13
BET1
LMTK2
PEG10
BRI3
TECPR1
ASB4
ANKIB1
SAMD9
PPP1R9A
CCDC132
BAIAP2L1
ACN9
CASD1
RBM48
BHLHA15
SAMD9L
HEPACAM2
FAM133B
LRRD1
C7orf76
MIR4652
MIR5692C2
MIR5692A1

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 7p11.2.

Table S8. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
EGFR
LANCL2
VOPP1

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 18q11.2.

Table S9. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
snoU13\|ENSG00000238907.1
RNU6ATAC20P
RNA5SP451
GATA6
MIB1
CTAGE1

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 10q26.13.

Table S10. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
FGFR2
RN7SKP167
TACC2
ATE1
NSMCE4A

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 17q12.

Table S11. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
RARA
ERBB2
LASP1
MLLT6
KRT223P
KRT222
KRT222
RNA5SP441
GJD3
RNY4P8
SNORD124
IKZF3
MIR4728
PNMT
TCAP
NEUROD2
ARL5C
LINC00672
SNORA21\|ENSG00000199293.1
SNORA21\|ENSG00000252699.1
RPL23
RNA5SP440
CISD3
MIR4726
RN7SL102P
RN7SL458P
SOCS7
ACACA
CACNB1
CDC6
CCR7
CSF3
GRB7
IGFBP4
KRT10
KRT12
MED1
PSMB3
PSMD3
RPL19
SMARCE1
TADA2A
HNF1B
THRA
TOP2A
PCGF2
PIP4K2B
NR1D1
MED24
STARD3
DDX52
DUSP14
SYNRG
CASC3
KRT23
RAPGEFL1
CDK12
KRT20
CWC25
GSDMB
PLXDC1
ARHGAP23
SRCIN1
PPP1R1B
TBC1D3F
MIEN1
MRPL45
TNS4
FBXL20
PGAP3
ORMDL3
ZPBP2
KRT40
WIPF2
KRT25
TMEM99
KRT28
KRT24
C17orf78
GSDMA
MSL1
KRT27
LRRC37A11P
STAC2
KRT26
C17orf98
KRT39
GPR179
FBXO47
TBC1D3
C17orf96
LRRC3C
MIR4734
MIR4727

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 6p21.1.

Table S12. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
TCTE1
TMEM151B
RSPH9
SCARNA15\|ENSG00000252218.1
CDC5L
SLC29A1
HSP90AB1
NFKBIE
POLH
VEGFA
MAD2L1BP
CAPN11
GTPBP2
MRPS18A
TMEM63B
AARS2
XPO5
MRPL14
SPATS1
C6orf223
SLC35B2
TMEM151B
MIR4647

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 1q42.3.

Table S13. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
SNORA14B
RN7SL668P
RNY4P16
TOMM20
RBM34
ARID4B

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 13q22.1.

Table S14. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
LINC00392
LINC00393
RNY1P8
KLF5
PIBF1
KLF12

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 15q26.1.

Table S15. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
BLM
IDH2
NTRK3
CRTC3
DDX11L9
WASH3P
FAM138E
OR4F13P
RN7SL209P
DNM1P47
snoU13\|ENSG00000238502.1
LINS
RNA5SP402
RN7SL484P
DKFZP779J2370
MIR4714
RNA5SP401
RN7SL677P
RN7SKP181
RN7SKP254
MIR1469
LINC00924
MIR3175
RN7SL599P
LINC00930
snoU109\|ENSG00000239197.1
snoU13\|ENSG00000238981.1
RN7SL363P
SNORD18\|ENSG00000200677.1
ZNF774
GABARAPL3
RN7SL736P
CIB1
RN7SL346P
RN7SL755P
C15orf38
MIR5009
MIR5094
MESP1
LINC00928
LINC00925
ISG20
ACAN
ALDH1A3
ANPEP
CHD2
FES
IGF1R
MAN2A2
MEF2A
MFGE8
FURIN
PCSK6
PLIN1
POLG
RLBP1
SNRPA1
NR2F2
ST8SIA2
PEX11A
IQGAP1
PRC1
SV2B
AP3S2
SEMA4B
ABHD2
CHSY1
SYNM
VPS33B
MRPL46
OR4F4
SLCO3A1
NGRN
RHCG
DET1
FANCI
MCTP2
VIMP
UNC45A
RGMA
WDR93
AEN
TTC23
MRPS11
LRRK1
TM2D3
TICRR
RCCD1
ARRDC4
LINC00923
TARSL2
LRRC28
AGBL1
ASB7
LYSMD4
PGPEP1L
C15orf32
HAPLN3
MESP2
SPATA8
LINC00052
ADAMTS17
DNM1P46
CERS3
FAM169B
KIF7
ZNF710
HDDC3
SPATA41
GDPGP1
OR4F6
OR4F15
FAM174B
TTLL13
MIR1179
MIR3174

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 11q13.3.

Table S16. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CCND1
MIR548K
FGF3
MIR3164
TPCN2
CHKA
CPT1A
CTTN
FGF4
IGHMBP2
LRP5
PPFIA1
FADD
MTL5
FGF19
SHANK2
MYEOV
GAL
SUV420H1
C11orf24
ANO1
PPP6R3
MRGPRD
MRGPRF
MRPL21
ORAOV1

Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 17q12.

Table S17. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
MLLT6
TAF15
MIR4726
RN7SL102P
RN7SL458P
SOCS7
HMGB1P24
MIR2909
RNA5SP439
CCL4L2
RN7SL301P
CCL4
CCL18
RN7SKP274
SNORD7
SLFN12L
SLFN5
Vault\|ENSG00000252328.1
snoU13\|ENSG00000238858.1
RNA5SP438
ACACA
ASIC2
AP2B1
LHX1
LIG3
PEX12
RAD51D
CCL1
CCL2
CCL3
CCL3L1
CCL5
CCL7
CCL8
CCL11
CCL13
CCL14
CCL15
CCL16
CCL23
TADA2A
HNF1B
ZNHIT3
CCL4L1
CCT6B
DDX52
DUSP14
SYNRG
AATF
NLE1
FNDC8
SLFN12
LYZL6
ARHGAP23
MMP28
DHRS11
GGNBP2
MRM1
MYO19
SRCIN1
TBC1D3F
MRPL45
ZNF830
SLFN11
RASL10B
RFFL
TMEM132E
C17orf50
SLFN13
SLC35G3
UNC45B
RDM1
GAS2L2
C17orf66
PIGW
C17orf78
SLFN14
C17orf102
TBC1D3B
TBC1D3C
CCL3L3
GPR179
TBC1D3G
TBC1D3
TBC1D3H
C17orf96
MIR4734

Figure 2. Genomic positions of deleted regions: the X-axis represents the normalized deletion signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 2. Get Full Table Deletions Table - 5 significant deletions found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband	Q value	Residual Q value	Wide Peak Boundaries	# Genes in Wide Peak
5q12.1	0.034943	0.034943	chr5:59050959-59380706	1
9p23	0.034943	0.034943	chr9:9455034-9627050	1
16q23.1	0.034943	0.034943	chr16:78583111-78667556	1
9p21.3	0.039807	0.039807	chr9:21813062-22034981	4
6p25.3	0.16428	0.16428	chr6:1867395-2191114	1

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 5q12.1.

Table S18. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
PDE4D

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 9p23.

Table S19. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
PTPRD

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 16q23.1.

Table S20. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
WWOX

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 9p21.3.

Table S21. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CDKN2A
CDKN2B
MTAP
C9orf53

Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 6p25.3.

Table S22. Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
GMDS

Arm-level results

Table 3. Get Full Table Arm-level significance table - 21 significant results found. The significance cutoff is at Q value=0.25.

Arm	# Genes	Amp Frequency	Amp Z score	Amp Q value	Del Frequency	Del Z score	Del Q value
1p	1300	0.03	-0.415	1	0.07	1.57	0.217
1q	1195	0.09	1.94	0.106	0.05	0.165	0.869
2p	624	0.05	-1.8	1	0.01	-3.02	0.999
2q	967	0.04	-1.2	1	0.03	-1.57	0.999
3p	644	0.03	-2.21	1	0.10	-0.0483	0.903
3q	733	0.11	0.616	0.808	0.04	-1.64	0.999
4p	289	0.01	-3.31	1	0.18	1.51	0.217
4q	670	0.00	-3.08	1	0.16	2.24	0.062
5p	183	0.17	0.972	0.602	0.10	-1.08	0.999
5q	905	0.05	-0.74	1	0.14	2.45	0.0555
6p	710	0.07	-0.803	1	0.05	-1.44	0.999
6q	556	0.05	-1.81	1	0.07	-1.21	0.999
7p	389	0.32	6.32	1.05e-09	0.01	-2.81	0.999
7q	783	0.30	7.8	4.14e-14	0.04	-1.42	0.999
8p	338	0.27	4.17	9.97e-05	0.23	2.85	0.022
8q	551	0.43	10.5	0	0.11	0.0873	0.886
9p	301	0.08	-1.21	1	0.24	3.32	0.00964
9q	700	0.08	-0.34	1	0.10	0.293	0.814
10p	253	0.16	0.668	0.808	0.12	-0.382	0.998
10q	738	0.06	-0.966	1	0.09	0.00336	0.903
11p	509	0.06	-1.45	1	0.12	0.289	0.814
11q	975	0.06	-0.272	1	0.08	0.47	0.751
12p	339	0.12	-0.197	1	0.06	-1.83	0.999
12q	904	0.09	0.492	0.808	0.03	-1.64	0.999
13q	560	0.21	3.53	0.000907	0.04	-2.1	0.999
14q	938	0.01	-2.37	1	0.07	-0.192	0.96
15q	810	0.02	-2.2	1	0.10	0.83	0.508
16p	559	0.06	-1.28	1	0.14	1.09	0.406
16q	455	0.03	-2.4	1	0.17	1.87	0.138
17p	415	0.05	-2.01	1	0.22	3.3	0.00964
17q	972	0.08	0.459	0.808	0.06	-0.282	0.978
18p	104	0.15	0.222	0.969	0.11	-1.03	0.999
18q	275	0.08	-1.37	1	0.21	2.34	0.0555
19p	681	0.11	0.513	0.808	0.16	2.39	0.0555
19q	935	0.16	3.55	0.000907	0.10	1.03	0.406
20p	234	0.37	6.81	4.81e-11	0.09	-1.2	0.999
20q	448	0.49	12.1	0	0.02	-2.26	0.999
21q	258	0.02	-2.89	1	0.24	3.19	0.00964
22q	564	0.01	-2.92	1	0.15	1.55	0.217
Xq	668	0.22	4.12	0.00011	0.13	1.03	0.406

Methods & Data

Input

Description

Segmentation File: The segmentation file contains the segmented data for all the samples identified by GLAD, CBS, or some other segmentation algorithm. (See GLAD file format in the Genepattern file formats documentation.) It is a six column, tab-delimited file with an optional first line identifying the columns. Positions are in base pair units.The column headers are: (1) Sample (sample name), (2) Chromosome (chromosome number), (3) Start Position (segment start position, in bases), (4) End Position (segment end position, in bases), (5) Num markers (number of markers in segment), (6) Seg.CN (log2() -1 of copy number).
Markers File: The markers file identifies the marker names and positions of the markers in the original dataset (before segmentation). It is a three column, tab-delimited file with an optional header. The column headers are: (1) Marker Name, (2) Chromosome, (3) Marker Position (in bases).
Reference Genome: The reference genome file contains information about the location of genes and cytobands on a given build of the genome. Reference genome files are created in Matlab and are not viewable with a text editor.
CNV Files: There are two options for the cnv file. The first option allows CNVs to be identified by marker name. The second option allows the CNVs to be identified by genomic location. Option #1: A two column, tab-delimited file with an optional header row. The marker names given in this file must match the marker names given in the markers file. The CNV identifiers are for user use and can be arbitrary. The column headers are: (1) Marker Name, (2) CNV Identifier. Option #2: A 6 column, tab-delimited file with an optional header row. The 'CNV Identifier' is for user use and can be arbitrary. 'Narrow Region Start' and 'Narrow Region End' are also not used. The column headers are: (1) CNV Identifier, (2) Chromosome, (3) Narrow Region Start, (4) Narrow Region End, (5) Wide Region Start, (6) Wide Region End
Amplification Threshold: Threshold for copy number amplifications. Regions with a log2 ratio above this value are considered amplified.
Deletion Threshold: Threshold for copy number deletions. Regions with a log2 ratio below the negative of this value are considered deletions.
Cap Values: Minimum and maximum cap values on analyzed data. Regions with a log2 ratio greater than the cap are set to the cap value; regions with a log2 ratio less than -cap value are set to -cap. Values must be positive.
Broad Length Cutoff: Threshold used to distinguish broad from focal events, given in units of fraction of chromosome arm.
Remove X-Chromosome: Flag indicating whether to remove data from the X-chromosome before analysis. Allowed values= {1,0} (1: Remove X-Chromosome, 0: Do not remove X-Chromosome.
Confidence Level: Confidence level used to calculate the region containing a driver.
Join Segment Size: Smallest number of markers to allow in segments from the segmented data. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number.
Arm Level Peel Off: Flag set to enable arm-level peel-off of events during peak definition. The arm-level peel-off enhancement to the arbitrated peel-off method assigns all events in the same chromosome arm of the same sample to a single peak. It is useful when peaks are split by noise or chromothripsis. Allowed values= {1,0} (1: Use arm level peel off, 0: Use normal arbitrated peel-off).
Maximum Sample Segments: Maximum number of segments allowed for a sample in the input data. Samples with more segments than this threshold are excluded from the analysis.
Gene GISTIC: When enabled (value = 1), this option causes GISTIC to analyze deletions using genes instead of array markers to locate the lesion. In this mode, the copy number assigned to a gene is the lowest copy number among the markers that represent the gene.

Values

List of inputs used for this run of GISTIC2. All files listed should be included in the archived results.

Segmentation File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/PrepareGisticDNASeq/STAD-TP/15089888/segmentationfile.txt
Markers File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/PrepareGisticDNASeq/STAD-TP/15089888/markersfile.txt
Reference Genome = /xchip/cga/reference/gistic2/hg19_GENCODE_v18_20140127.mat
CNV Files = /xchip/cga/reference/gistic2/CNV.hg19.bypos.111213.txt
Amplification Threshold = 0.3
Deletion Threshold = 0.3
Cap Values = 2
Broad Length Cutoff = 0.5
Remove X-Chromosome = 0
Confidence Level = 0.99
Join Segment Size = 10
Arm Level Peel Off = 1
Maximum Sample Segments = 10000
Gene GISTIC = 0

Table 4. Get Full Table First 10 out of 107 Input Tumor Samples.

Tumor Sample Names
TCGA-B7-5816-01A-21D-1598-02
TCGA-B7-5818-01A-11D-1598-02
TCGA-BR-4183-01A-02D-1128-02
TCGA-BR-4184-01A-01D-1128-02
TCGA-BR-4187-01A-01D-1128-02
TCGA-BR-4188-01A-01D-1128-02
TCGA-BR-4191-01A-02D-1128-02
TCGA-BR-4201-01A-01D-1128-02
TCGA-BR-4253-01A-01D-1128-02
TCGA-BR-4255-01A-01D-1128-02

Figure 3. Segmented copy number profiles in the input data

Output

All Lesions File (all_lesions.conf_##.txt, where ## is the confidence level)

The all lesions file summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row, starting in column 10.

Region Data

Columns 1-9 present the data about the significant regions as follows:

Unique Name: A name assigned to identify the region.
Descriptor: The genomic descriptor of that region.
Wide Peak Limits: The 'wide peak' boundaries most likely to contain the targeted genes. These are listed in genomic coordinates and marker (or probe) indices.
Peak Limits: The boundaries of the region of maximal amplification or deletion.
Region Limits: The boundaries of the entire significant region of amplification or deletion.
Q values: The Q value of the peak region.
Residual Q values: The Q value of the peak region after removing ('peeling off') amplifications or deletions that overlap other, more significant peak regions in the same chromosome.
Broad or Focal: Identifies whether the region reaches significance due primarily to broad events (called 'broad'), focal events (called 'focal'), or independently significant broad and focal events (called 'both').
Amplitude Threshold: Key giving the meaning of values in the subsequent columns associated with each sample.

Sample Data

Each of the analyzed samples is represented in one of the columns following the lesion data (columns 10 through end). The data contained in these columns varies slightly by section of the file. The first section can be identified by the key given in column 9 - it starts in row 2 and continues until the row that reads 'Actual Copy Change Given.' This section contains summarized data for each sample. A '0' indicates that the copy number of the sample was not amplified or deleted beyond the threshold amount in that peak region. A '1' indicates that the sample had low-level copy number aberrations (exceeding the low threshold indicated in column 9), and a '2' indicates that the sample had high-level copy number aberrations (exceeding the high threshold indicated in column 9).The second section can be identified the rows in which column 9 reads 'Actual Copy Change Given.' The second section exactly reproduces the first section, except that here the actual changes in copy number are provided rather than zeroes, ones, and twos.The final section is similar to the first section, except that here only broad events are included. A 1 in the samples columns (columns 10+) indicates that the median copy number of the sample across the entire significant region exceeded the threshold given in column 9. That is, it indicates whether the sample had a geographically extended event, rather than a focal amplification or deletion covering little more than the peak region.

Amplification Genes File (amp_genes.conf_##.txt, where ## is the confidence level)

The amp genes file contains one column for each amplification peak identified in the GISTIC analysis. The first four rows are:

Cytoband
Q value
Residual Q value
Wide Peak Boundaries

These rows identify the lesion in the same way as the all lesions file.The remaining rows list the genes contained in each wide peak. For peaks that contain no genes, the nearest gene is listed in brackets.

Deletion Genes File (del_genes.conf_##.txt, where ## is the confidence level)

The del genes file contains one column for each deletion peak identified in the GISTIC analysis. The file format for the del genes file is identical to the format for the amp genes file.

Gistic Scores File (scores.gistic)

The scores file lists the Q values [presented as -log10(q)], G scores, average amplitudes among aberrant samples, and frequency of aberration, across the genome for both amplifications and deletions. The scores file is viewable with the Genepattern SNPViewer module and may be imported into the Integrated Genomics Viewer (IGV).

Segmented Copy Number (raw_copy_number.{fig|pdf|png} )

The segmented copy number is a pdf file containing a colormap image of the segmented copy number profiles in the input data.

Amplification Score GISTIC plot (amp_qplot.{fig|pdf|png|v2.pdf})

The amplification pdf is a plot of the G scores (top) and Q values (bottom) with respect to amplifications for all markers over the entire region analyzed.

Deletion Score GISTIC plot (del_qplot.{fig|pdf|png|v2.pdf})

The deletion pdf is a plot of the G scores (top) and Q values (bottom) with respect to deletions for all markers over the entire region analyzed.

Tables (table_{amp|del}.conf_##.txt, where ## is the confidence level)

Tables of basic information about the genomic regions (peaks) that GISTIC determined to be significantly amplified or deleted. These describe three kinds of peak boundaries, and list the genes contained in two of them. The region start and region end columns (along with the chromosome column) delimit the entire area containing the peak that is above the significance level. The region may be the same for multiple peaks. The peak start and end delimit the maximum value of the peak. The extended peak is the peak determined by robust, and is contained within the wide peak reported in {amp|del}_genes.txt by one marker.

Broad Significance Results (broad_significance_results.txt)

A table of per-arm statistical results for the data set. Each arm is a row in the table. The first column specifies the arm and the second column counts the number of genes known to be on the arm. For both amplification and deletion, the table has columns for the frequency of amplification or deletion of the arm, and a Z score and Q value.

Broad Values By Arm (broad_values_by_arm.txt)

A table of chromosome arm amplification levels for each sample. Each row is a chromosome arm, and each column a sample. The data are in units of absolute copy number -2.

All Data By Genes (all_data_by_genes.txt)

A gene-level table of copy number values for all samples. Each row is the data for a gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. The copy number values in the table are in units of (copy number -2), so that no amplification or deletion is 0, genes with amplifications have positive values, and genes with deletions are negative values. The data are converted from marker level to gene level using the extreme method: a gene is assigned the greatest amplification or the least deletion value among the markers it covers.

Broad Data By Genes (broad_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only broad events with lengths greater than the broad length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

Focal Data By Genes (focal_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only focal events with lengths greater than the focal length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

All Thresholded By Genes (all_thresholded.by_genes.txt)

A gene-level table of discrete amplification and deletion indicators at for all samples. There is a row for each gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. A table value of 0 means no amplification or deletion above the threshold. Amplifications are positive numbers: 1 means amplification above the amplification threshold; 2 means amplifications larger to the arm level amplifications observed for the sample. Deletions are represented by negative table values: -1 represents deletion beyond the threshold; -2 means deletions greater than the minimum arm-level deletion observed for the sample.

Sample Cutoffs (sample_cutoffs.txt)

A table of the per-sample threshold cutoffs (in units of absolute copy number -2) used to distinguish the high level amplifications (+/-2) from ordinary amplifications (+/-1) in the all_thresholded.by_genes.txt output file. The table contains three columns: the sample identifier followed by the low (deletion) and high (amplification) cutoff values. The cutoffs are calculated as the minimum arm-level amplification level less the deletion threshold for deletions and the maximum arm-level amplification plus the amplification threshold for amplifications.

Focal Input To Gistic (focal_input.seg.txt)

A list of copy number segments describing just the focal events present in the data. The segment amplification/deletion levels are in units of (copy number -2), with amplifications positive and deletions negative numbers. This file may be viewed with IGV.

Gene Counts vs. Copy Number Alteration Frequency (freqarms_vs_ngenes.{fig|pdf})

An image showing the correlation between gene counts and frequency of copy number alterations.

Confidence Intervals (regions_track.conf_##.bed, where ## is the confidence level)

A file indicating the position of the confidence intervals around GISTIC peaks that can be loaded as a track in a compatible viewer browser such as IGV or the UCSC genome browser.

GISTIC

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. It takes segmented copy number ratios as input, separates arm-level events from focal events, and then performs two tests: (i) identifies significantly amplified/deleted chromosome arms; and (ii) identifies regions that are significantly focally amplified or deleted. For the focal analysis, the significance levels (Q values) are calculated by comparing the observed gains/losses at each locus to those obtained by randomly permuting the events along the genome to reflect the null hypothesis that they are all 'passengers' and could have occurred anywhere. The locus-specific significance levels are then corrected for multiple hypothesis testing. The arm-level significance is calculated by comparing the frequency of gains/losses of each arm to the expected rate given its size. The method outputs genomic views of significantly amplified and deleted regions, as well as a table of genes with gain or loss scores. A more in depth discussion of the GISTIC algorithm and its utility is given in [1], [3], and [5].

CNV Description

Regions of the genome that are prone to germ line variations in copy number are excluded from the GISTIC analysis using a list of germ line copy number variations (CNVs). A CNV is a DNA sequence that may be found at different copy numbers in the germ line of two different individuals. Such germ line variations can confound a GISTIC analysis, which finds significant somatic copy number variations in cancer. A more in depth discussion is provided in [6]. GISTIC currently uses two CNV exclusion lists. One is based on the literature describing copy number variation, and a second one comes from an analysis of significant variations among the blood normals in the TCGA data set.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References

[1] Beroukhim et al, Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma, Proc Natl Acad Sci U S A. Vol. 104:50 (2007)

[2] GISTIC version 1

[3] Mermel et al, GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers, Genome Biology Vol. 12:4 (2011)

[4] GISTIC version 2

[5] Beroukhim et al., The landscape of somatic copy-number alteration across human cancers, Nature Vol. 463:7283 (2010)

[6] McCarroll, S. A. et al., Integrated detection and population-genetic analysis of SNPs and copy number variation, Nat Genet Vol. 40(10):1166-1174 (2008)

[7] The Sanger Institute: Cancer Gene Census

Made with Nozzle