LowPass Copy number analysis (GISTIC2)
Prostate Adenocarcinoma (Primary solid tumor)
23 September 2013  |  analyses__2013_09_23
Maintainer Information
Citation Information
Maintained by Spring Yingchun Liu (Broad Institute)
Cite as Broad Institute TCGA Genome Data Analysis Center (2013): LowPass Copy number analysis (GISTIC2). Broad Institute of MIT and Harvard. doi:10.7908/C17D2SGX
Overview
Introduction

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. The pipeline first filters out normal samples from the segmented copy-number data by inspecting the TCGA barcodes and then executes GISTIC version 2.0.19 (Firehose task version: 125).

Summary

There were 15 tumor samples used in this analysis: 10 significant arm-level results, 1 significant focal amplifications, and 3 significant focal deletions were found.

Results
Focal results

Figure 1.  Genomic positions of amplified regions: the X-axis represents the normalized amplification signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 1.  Get Full Table Amplifications Table - 1 significant amplifications found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband Q value Residual Q value Wide Peak Boundaries # Genes in Wide Peak
8q23.1 0.04721 0.04721 chr8:93935184-146364022 329
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 8q23.1.

Table S1.  Genes in bold are cancer genes as defined by The Sanger Institute's Cancer Gene Census [7].

Genes
COX6C
EXT1
MYC
RECQL4
hsa-mir-1234
hsa-mir-939
hsa-mir-661
hsa-mir-937
hsa-mir-1302-7
hsa-mir-151
hsa-mir-30d
hsa-mir-1208
hsa-mir-1207
hsa-mir-1205
hsa-mir-1204
hsa-mir-548d-1
hsa-mir-2053
hsa-mir-548a-3
hsa-mir-3151
hsa-mir-1273
hsa-mir-875
hsa-mir-3150
ADCY8
ANGPT1
ANXA13
ATP6V1C1
BAI1
CDH17
CYC1
CYP11B1
CYP11B2
DPYS
EEF1D
GEM
GLI4
GML
GPR20
GPT
GRINA
HAS2
HSF1
EIF3E
KCNQ3
KCNS2
LY6E
LY6H
MATN2
NDUFB9
TONSL
NOV
ODF1
TNFRSF11B
ENPP2
PLEC
POLR2K
POU5F1B
PTK2
PVT1
RAD21
RPL8
RPL30
SDC2
ST3GAL1
SLA
SNTB1
SPAG1
SQLE
STK3
TAF2
TG
KLF10
TRHR
TRPS1
TSTA3
COL14A1
UQCRB
YWHAZ
ZNF7
ZNF16
PSCA
FZD6
LY6D
JRK
EIF3H
DGAT1
GPAA1
WISP1
FOXH1
CCNE2
EBAG9
LRRC14
TTC35
RIMS2
MTSS1
PTDSS1
ZNF623
KIAA0196
HHLA1
TRIB1
HRSP12
NDRG1
PGCP
COLEC10
KHDRBS3
POP1
PTP4A3
RNF139
ZHX1
PUF60
ZHX2
DENND3
ZC3H3
EFR3A
ARC
BOP1
ZFPM2
SCRIB
LRRC6
RAD54B
DCAF13
RNF19A
KIAA1429
RGS22
FBXL6
SNORA72
OPLAH
PABPC1
KCNV1
MTBP
EIF2C2
COMMD5
MRPL13
ATAD2
ASAP1-IT1
CPSF1
LRP12
RRM2B
CYHR1
ASAP1
MTERFD1
FAM135B
PHF20L1
ZNF706
VPS28
FAM203A
KCNK9
C8orf55
UBR5
FAM49B
AZIN1
CHRAC1
EXOSC4
PDP1
LY6K
ESRP1
TRMT12
OXR1
WDYHV1
LAPTM4B
C8orf39
SLC39A4
SYBU
INTS8
GSDMC
ENY2
SLURP1
SLC45A4
ZFAT
ZNF250
DEPTOR
PYCRL
C8orf33
LYNX1
C8orf51
DSCC1
DERL1
GPR172A
PLEKHF2
GSDMD
NIPAL2
BAALC
ZNF696
GRHL2
ARHGAP39
ZNF34
SLC25A32
TM7SF4
SHARPIN
EPPK1
SCRT1
TRAPPC9
TATDN1
NACAP1
NCALD
MAF1
UTP23
PARP10
C8orf76
TIGD5
NUDCD1
FAM83A
PPP1R16A
TSPYL5
MED30
ZNF251
KIFC2
TMEM67
MTDH
PKHD1L1
NAPRT1
WDR67
HPYR1
TP53INP1
MFSD3
MAL2
CSMD3
RHPN1
FBXO32
CTHRC1
OSR2
TOP1MT
ZNF572
FAM92A1
C8orf38
ABRA
LYPD2
TMEM71
TMEM65
LOC157381
C8orf56
ANKRD46
FAM84B
C8orf37
VPS13B
TMEM74
FAM91A1
SLC30A8
COL22A1
SNX31
ADCK5
TSNARE1
C8orf47
MAPK15
NSMCE2
ZNF707
BREA2
FAM83H
LOC286094
ZNF252
TMED10P1
C8orf77
C8orf31
ZFP41
C8orf83
DPY19L4
FBXO43
GPIHBP1
KLHL38
NRBP2
ZNF517
KIAA1875
C8ORFK29
RSPO2
SPATC1
LOC389676
RBM12B
FLJ43860
MAFA
GDF6
SAMD12
MIR30B
MIR30D
C8orf82
FER1L6-AS1
FLJ42969
C8orf85
LRRC24
SAMD12-AS1
ZFAT-AS1
HAS2-AS1
C8orf69
LINC00051
C8orf73
SCXB
LINC00535
RAD21-AS1
FER1L6
MIR599
MIR661
LOC727677
HEATR7A
LOC728724
OC90
LOC731779
MIR875
MIR937
MIR939
LOC100128338
SCXA
LOC100130231
CCDC166
LOC100131726
LOC100133669
LOC100288181
LOC100288748
MIR1205
MIR1206
MIR1207
MIR1204
MIR1234
MIR2053
MIR1208
MIR3150A
MIR3151
LOC100499183
LOC100500773
MIR3150B
MIR3610
LOC100507117
ZHX1-C8ORF76
MIR378D2
MIR4663
MIR4472-1
MIR4664
MIR4471
LOC100616530
PCAT1
LINC00536
FSBP

Figure 2.  Genomic positions of deleted regions: the X-axis represents the normalized deletion signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 2.  Get Full Table Deletions Table - 3 significant deletions found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband Q value Residual Q value Wide Peak Boundaries # Genes in Wide Peak
3q26.31 0.10139 0.10139 chr3:171086049-198022430 197
6q14.1 0.10139 0.10139 chr6:81745283-127847770 205
13q13.3 0.10139 0.10139 chr13:37890423-95333015 191
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 3q26.31.

Table S2.  Genes in bold are cancer genes as defined by The Sanger Institute's Cancer Gene Census [7].

Genes
BCL6
EIF4A2
ETV5
LPP
PIK3CA
SOX2
TFRC
hsa-mir-922
hsa-mir-570
hsa-mir-3137
hsa-mir-944
hsa-mir-28
hsa-mir-1248
hsa-mir-1224
ACTL6A
AHSG
APOD
BDH1
AP2M1
CLCN2
CPN2
CRYGS
DGKG
DLG1
DVL3
ECT2
EHHADH
EIF4G1
EPHB3
FGF12
GHSR
GP5
HRG
HES1
IL1RAP
KNG1
MFI2
MUC4
NDUFB5
OPA1
PAK2
PCYT1A
PLD1
POLR2H
PPP1R2
MASP1
PSMD2
RFC4
SNORA63
RPL35A
TRA2B
ST6GAL1
SST
THPO
FXR1
TP63
CHRD
TNFSF10
EIF2B5
USP13
CLDN1
MAP3K13
ADIPOQ
KIAA0226
ECE2
ABCC5
TNK2
ALG3
KCNMB2
IGF2BP2
CLDN16
NLGN1
NCBP2
TNIK
MCF2L2
ATP11B
VPS8
ACAP2
UBXN7
FETUB
LAMP3
KCNMB3
ZNF639
PEX5L
DNAJB11
DCUN1D1
KLHL24
PIGX
TBCCD1
LEPREL1
ABCF3
LSG1
PARL
MFN1
YEATS2
MCCC1
HRASLS
MRPL47
NCEH1
SENP2
GNB4
RTP4
MAGEF1
ZMAT3
FNDC3B
ATP13A3
TBL1XR1
MAP6D1
PIGZ
SPATA16
B3GNT5
IQCG
ATP13A4
FYTTD1
MGC2889
LRCH3
CEP19
LMLN
KLHL6
VWA5B2
TMEM41A
TMEM44
CAMK2N2
TM4SF19
RPL39L
DNAJC19
FAM131A
ZDHHC19
LRRC15
FAM43A
TMEM207
RTP1
TTC14
MB21D2
XXYLT1
CCDC50
PYDC2
LOC152217
RNF168
HTR3C
LIPH
HTR3D
OSTalpha
FBXO45
MUC20
SENP5
LOC220729
LOC253573
NAALADL2
TCTEX1D2
C3orf43
SDHAP1
UTS2D
HTR3E
C3orf70
TPRG1
CCDC39
LOC339926
LPP-AS2
LOC344887
RTP2
OSTN
ATP13A5
SOX2-OT
WDR53
ANKRD18DP
LRRC33
TMEM212
FLJ46066
FLJ42393
FLJ34208
LOC401109
SNORD2
SNORA4
C3orf65
GMNC
LOC647323
SNORA81
SNORD66
MIR570
SDHAP2
FAM157A
MIR922
MIR944
LOC100128023
LOC100131551
LOC100131635
SNAR-I
MIR1224
MIR1248
LOC100505687
MFI2-AS1
LOC100507086
LOC100507391
TM4SF19-TCTEX1D2
MIR4797
MIR4789
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 6q14.1.

Table S3.  Genes in bold are cancer genes as defined by The Sanger Institute's Cancer Gene Census [7].

Genes
PRDM1
ROS1
STL
GOPC
hsa-mir-588
hsa-mir-3144
hsa-mir-548b
hsa-mir-587
hsa-mir-2113
AIM1
AMD1
CCNC
CGA
CNR1
COL10A1
EPHA7
FABP7
FOXO3
FRK
FYN
GABRR1
GABRR2
GJA1
GPR6
GRIK2
HDAC2
HSF2
HTR1E
KPNA5
LAMA4
MARCKS
MAN1A1
ME1
NT5E
PGM3
PLN
POU3F2
PREP
PKIB
REV3L
SIM1
SMPD2
MAP3K7
NR2E1
TPBG
TPD52L1
TSPYL1
DDO
SNX3
RNGTT
CD164
WISP3
WASF1
TBX18
FHL5
ATG5
KIAA0408
ZBTB24
SNAP91
FIG4
CASP8AP2
TRDN
SYNCRIP
SLC35A1
FUT9
TRAF3IP2
SMPDL3A
PNRC1
ASCC3
BVES
SEC63
KIAA1009
ANKRD6
DOPEY1
ZNF292
CDK19
MDN1
TSPYL4
UFL1
HEY2
ORC3
BRD7P3
ASF1A
PNISR
IBTK
FBXL4
SNORD50A
SESN1
OSTM1
NDUFAF4
DSE
HDDC2
CYB5R4
TUBE1
C6orf203
CDC40
RWDD1
UBE2J1
COQ3
SOBP
AKIRIN2
QRSL1
AKIRIN2-AS1
FAM46A
ECHDC1
RARS2
PDSS2
C6orf162
LYRM2
SNX14
SERINC1
HACE1
BEND3
RRAGD
PRDM13
BACH2
TRMT11
C6orf164
POPDC3
MICAL1
FAM184A
MANEA
GPR63
SPACA1
RNF146
ARMC2
RPF2
MCHR2
FAXC
GJA10
RTN4IP1
RSPO3
USP45
SLC22A16
UBE2CBP
KIAA1919
GTF3C6
MRAP2
RWDD2A
KLHL32
NUS1
SLC16A10
RIPPLY2
CLVS2
NCOA7
HINT3
PM20D2
SRSF12
RNF217
NKAIN2
C6orf165
BVES-AS1
PRSS35
C6orf163
CCDC162P
AKD1
NT5DC1
FAM26D
ZUFSP
FAM162B
C6orf170
HS3ST5
GPRC6A
RFX6
SLC35F1
VGLL2
LACE1
MMS22L
FAM26E
MCM9
SCML4
CEP57L1
PPIL6
LOC285758
FLJ34503
DCBLD1
LOC285762
RSPH4A
GJB7
SNHG5
CENPW
C6orf174
LINC00222
CEP85L
LIN28B
FAM26F
GSTM2P1
RFPL4B
C6orf225
TSG1
LOC643623
TRAF3IP2-AS1
SNORD50B
MIR548B
LOC728012
TPI1P3
C6orf186
BET3L
LOC100130890
LOC100287632
MIR2113
MIR548H3
LOC100422737
MIR4464
MIR4643
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 13q13.3.

Table S4.  Genes in bold are cancer genes as defined by The Sanger Institute's Cancer Gene Census [7].

Genes
LCP1
RB1
LHFP
TTL
hsa-mir-92a-1
hsa-mir-622
hsa-mir-3169
hsa-mir-1297
hsa-mir-759
hsa-mir-15a
hsa-mir-3168
hsa-mir-621
hsa-mir-4305
ATP7B
KLF5
BTF3P11
RCBTB2
CLN5
CPB2
DACH1
DCT
EDNRB
ELF1
ESD
GPC5
FOXO1
MLNR
GTF2F2
GUCY1B2
HTR2A
KPNA3
LMO7
NEK3
PCDH8
PCDH9
POU4F1
ATXN8OS
TPT1
TRPC4
UCHL3
TNFSF11
SCEL
SUCLA2
DLEU2
TSC22D1
ITM2B
MTRF1
UTP14C
TBC1D4
GPC6
LPAR6
SLC25A15
TRIM13
MRPS31
SPRY2
DLEU1
PIBF1
OLFM4
POSTN
SUGT1
LECT1
WBP4
AKAP11
KLF12
FNDC3A
DIS3
MYCBP2
KIAA0564
ZC3H13
LRCH1
TGDS
SLITRK5
FBXL3
INTS6
CKAP2
NUFIP1
PCDH17
C13orf15
MED4
DNAJC15
VPS36
PHF11
UFM1
NDFIP2
ENOX1
RCBTB1
NUDT15
KIAA1704
THSD1
CYSLTR2
SPRYD7
COG6
KLHL1
RBM26
PCDH20
RNF219
NAA16
RNASEH2B
DHRS12
BORA
KIAA0226L
PROSER1
TDRD3
CDADC1
CAB39L
DIAPH3
CCDC70
COG3
SETDB2
KBTBD7
SLITRK6
EBPL
KBTBD6
EPSTI1
SLITRK1
KCTD12
ARL11
WDFY2
LINC00284
SLAIN1
PRR20A
LINC00410
FAM216B
LACC1
LINC00330
HNRNPA1L2
ST13P4
DGKH
CCDC122
GPR180
STOML3
COMMD6
FAM194B
SPERT
DLEU7
FAM124A
TPTE2P3
CTAGE10P
SLC25A30
OR7E156P
SUGT1P3
SIAH3
KCNRG
LINC00282
LINC00550
LINC00347
FREM2
NEK5
THSD1P1
KCTD4
NHLRC3
SERP2
LINC00547
LINC00548
MIR15A
MIR16-1
MIR17
MIR18A
MIR19A
MIR19B1
MIR20A
MIR92A1
MIR17HG
ALG11
MZT1
TSC22D1-AS1
MIR4500HG
SERPINE3
CTAGE11P
SNORA31
MIR621
MIR622
PRR20B
PRR20C
PRR20D
PRR20E
TPT1-AS1
MIR1297
MIR759
MIR320D1
MIR4305
MIR3169
MIR3665
MIR3613
RBM26-AS1
OR7E37P
LOC100507240
LOC100509894
MIR4500
MIR4703
LOC100616668
Arm-level results

Table 3.  Get Full Table Arm-level significance table - 10 significant results found. The significance cutoff is at Q value=0.25.

Arm # Genes Amp Frequency Amp Z score Amp Q value Del Frequency Del Z score Del Q value
1p 2121 0.00 -0.204 0.784 0.00 -0.204 0.784
1q 1955 0.00 -0.296 0.784 0.07 2.97 0.0127
2p 924 0.00 -0.654 0.784 0.00 -0.654 0.784
2q 1556 0.07 1.69 0.784 0.00 -0.454 0.784
3p 1062 0.00 -0.618 0.784 0.00 -0.618 0.784
3q 1139 0.07 1.12 0.784 0.00 -0.577 0.784
4p 489 0.00 -0.731 0.784 0.07 0.616 0.594
4q 1049 0.00 -0.621 0.784 0.00 -0.621 0.784
5p 270 0.00 -0.777 0.784 0.07 0.494 0.621
5q 1427 0.00 -0.477 0.784 0.13 3.46 0.00546
6p 1173 0.00 -0.588 0.784 0.00 -0.588 0.784
6q 839 0.00 -0.675 0.784 0.00 -0.675 0.784
7p 641 0.14 2.27 0.464 0.08 0.868 0.551
7q 1277 0.07 1.27 0.784 0.00 -0.539 0.784
8p 580 0.00 -0.537 0.784 0.47 9.12 0
8q 859 0.08 1.03 0.784 0.14 2.54 0.0374
9p 422 0.00 -0.745 0.784 0.07 0.577 0.594
9q 1113 0.00 -0.604 0.784 0.00 -0.604 0.784
10p 409 0.00 -0.774 0.784 0.00 -0.774 0.784
10q 1268 0.00 -0.542 0.784 0.07 1.26 0.377
11p 862 0.00 -0.669 0.784 0.00 -0.669 0.784
11q 1515 0.00 -0.468 0.784 0.07 1.61 0.213
12p 575 0.00 -0.686 0.784 0.13 2.07 0.095
12q 1447 0.00 -0.506 0.784 0.00 -0.506 0.784
13q 654 0.00 -0.694 0.784 0.07 0.72 0.589
14q 1341 0.00 -0.539 0.784 0.00 -0.539 0.784
15q 1355 0.00 -0.535 0.784 0.00 -0.535 0.784
16p 872 0.00 -0.644 0.784 0.07 0.878 0.551
16q 702 0.00 -0.684 0.784 0.07 0.753 0.589
17p 683 0.00 -0.663 0.784 0.13 2.19 0.0812
17q 1592 0.00 -0.442 0.784 0.07 1.76 0.176
18p 143 0.00 -0.743 0.784 0.20 2.95 0.0127
18q 446 0.00 -0.685 0.784 0.20 3.3 0.00635
19p 995 0.00 -0.635 0.784 0.00 -0.635 0.784
19q 1709 0.00 -0.415 0.784 0.00 -0.415 0.784
20p 355 0.00 -0.786 0.784 0.00 -0.786 0.784
20q 753 0.00 -0.696 0.784 0.00 -0.696 0.784
21q 509 0.00 -0.726 0.784 0.07 0.628 0.594
22q 921 0.00 -0.632 0.784 0.07 0.918 0.551
Xq 1312 0.00 -0.548 0.784 0.00 -0.548 0.784
Methods & Data
Input
Description
  • Segmentation File: The segmentation file contains the segmented data for all the samples identified by GLAD, CBS, or some other segmentation algorithm. (See GLAD file format in the Genepattern file formats documentation.) It is a six column, tab-delimited file with an optional first line identifying the columns. Positions are in base pair units.The column headers are: (1) Sample (sample name), (2) Chromosome (chromosome number), (3) Start Position (segment start position, in bases), (4) End Position (segment end position, in bases), (5) Num markers (number of markers in segment), (6) Seg.CN (log2() -1 of copy number).

  • Markers File: The markers file identifies the marker names and positions of the markers in the original dataset (before segmentation). It is a three column, tab-delimited file with an optional header. The column headers are: (1) Marker Name, (2) Chromosome, (3) Marker Position (in bases).

  • Reference Genome: The reference genome file contains information about the location of genes and cytobands on a given build of the genome. Reference genome files are created in Matlab and are not viewable with a text editor.

  • CNV Files: There are two options for the cnv file. The first option allows CNVs to be identified by marker name. The second option allows the CNVs to be identified by genomic location. Option #1: A two column, tab-delimited file with an optional header row. The marker names given in this file must match the marker names given in the markers file. The CNV identifiers are for user use and can be arbitrary. The column headers are: (1) Marker Name, (2) CNV Identifier. Option #2: A 6 column, tab-delimited file with an optional header row. The 'CNV Identifier' is for user use and can be arbitrary. 'Narrow Region Start' and 'Narrow Region End' are also not used. The column headers are: (1) CNV Identifier, (2) Chromosome, (3) Narrow Region Start, (4) Narrow Region End, (5) Wide Region Start, (6) Wide Region End

  • Amplification Threshold: Threshold for copy number amplifications. Regions with a log2 ratio above this value are considered amplified.

  • Deletion Threshold: Threshold for copy number deletions. Regions with a log2 ratio below the negative of this value are considered deletions.

  • Cap Values: Minimum and maximum cap values on analyzed data. Regions with a log2 ratio greater than the cap are set to the cap value; regions with a log2 ratio less than -cap value are set to -cap. Values must be positive.

  • Broad Length Cutoff: Threshold used to distinguish broad from focal events, given in units of fraction of chromosome arm.

  • Remove X-Chromosome: Flag indicating whether to remove data from the X-chromosome before analysis. Allowed values= {1,0} (1: Remove X-Chromosome, 0: Do not remove X-Chromosome.

  • Confidence Level: Confidence level used to calculate the region containing a driver.

  • Join Segment Size: Smallest number of markers to allow in segments from the segmented data. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number.

  • Arm Level Peel Off: Flag set to enable arm-level peel-off of events during peak definition. The arm-level peel-off enhancement to the arbitrated peel-off method assigns all events in the same chromosome arm of the same sample to a single peak. It is useful when peaks are split by noise or chromothripsis. Allowed values= {1,0} (1: Use arm level peel off, 0: Use normal arbitrated peel-off).

  • Maximum Sample Segments: Maximum number of segments allowed for a sample in the input data. Samples with more segments than this threshold are excluded from the analysis.

  • Gene GISTIC: When enabled (value = 1), this option causes GISTIC to analyze deletions using genes instead of array markers to locate the lesion. In this mode, the copy number assigned to a gene is the lowest copy number among the markers that represent the gene.

Values

List of inputs used for this run of GISTIC2. All files listed should be included in the archived results.

  • Segmentation File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/PrepareGisticDNASeq/PRAD-TP/4428946/segmentationfile.txt

  • Markers File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/PrepareGisticDNASeq/PRAD-TP/4428946/markersfile.txt

  • Reference Genome = /xchip/cga/reference/gistic2/hg19_with_miR_20120227.mat

  • CNV Files = /xchip/cga/reference/gistic2/CNV.hg19.bypos.111213.txt

  • Amplification Threshold = 0.3

  • Deletion Threshold = 0.3

  • Cap Values = 2

  • Broad Length Cutoff = 0.5

  • Remove X-Chromosome = 0

  • Confidence Level = 0.99

  • Join Segment Size = 10

  • Arm Level Peel Off = 1

  • Maximum Sample Segments = 10000

  • Gene GISTIC = 0

Table 4.  Get Full Table First 10 out of 15 Input Tumor Samples.

Tumor Sample Names
TCGA-CH-5743-01A-21D-1572-02
TCGA-CH-5744-01A-11D-1572-02
TCGA-CH-5745-01A-11D-1572-02
TCGA-CH-5746-01A-11D-1572-02
TCGA-CH-5750-01A-11D-1572-02
TCGA-CH-5751-01A-11D-1572-02
TCGA-CH-5754-01A-11D-1572-02
TCGA-CH-5762-01A-11D-1572-02
TCGA-CH-5763-01A-11D-1572-02
TCGA-CH-5765-01A-11D-1572-02

Figure 3.  Segmented copy number profiles in the input data

Output
All Lesions File (all_lesions.conf_##.txt, where ## is the confidence level)

The all lesions file summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row, starting in column 10.

Region Data

Columns 1-9 present the data about the significant regions as follows:

  1. Unique Name: A name assigned to identify the region.

  2. Descriptor: The genomic descriptor of that region.

  3. Wide Peak Limits: The 'wide peak' boundaries most likely to contain the targeted genes. These are listed in genomic coordinates and marker (or probe) indices.

  4. Peak Limits: The boundaries of the region of maximal amplification or deletion.

  5. Region Limits: The boundaries of the entire significant region of amplification or deletion.

  6. Q values: The Q value of the peak region.

  7. Residual Q values: The Q value of the peak region after removing ('peeling off') amplifications or deletions that overlap other, more significant peak regions in the same chromosome.

  8. Broad or Focal: Identifies whether the region reaches significance due primarily to broad events (called 'broad'), focal events (called 'focal'), or independently significant broad and focal events (called 'both').

  9. Amplitude Threshold: Key giving the meaning of values in the subsequent columns associated with each sample.

Sample Data

Each of the analyzed samples is represented in one of the columns following the lesion data (columns 10 through end). The data contained in these columns varies slightly by section of the file. The first section can be identified by the key given in column 9 - it starts in row 2 and continues until the row that reads 'Actual Copy Change Given.' This section contains summarized data for each sample. A '0' indicates that the copy number of the sample was not amplified or deleted beyond the threshold amount in that peak region. A '1' indicates that the sample had low-level copy number aberrations (exceeding the low threshold indicated in column 9), and a '2' indicates that the sample had high-level copy number aberrations (exceeding the high threshold indicated in column 9).The second section can be identified the rows in which column 9 reads 'Actual Copy Change Given.' The second section exactly reproduces the first section, except that here the actual changes in copy number are provided rather than zeroes, ones, and twos.The final section is similar to the first section, except that here only broad events are included. A 1 in the samples columns (columns 10+) indicates that the median copy number of the sample across the entire significant region exceeded the threshold given in column 9. That is, it indicates whether the sample had a geographically extended event, rather than a focal amplification or deletion covering little more than the peak region.

Amplification Genes File (amp_genes.conf_##.txt, where ## is the confidence level)

The amp genes file contains one column for each amplification peak identified in the GISTIC analysis. The first four rows are:

  1. Cytoband

  2. Q value

  3. Residual Q value

  4. Wide Peak Boundaries

These rows identify the lesion in the same way as the all lesions file.The remaining rows list the genes contained in each wide peak. For peaks that contain no genes, the nearest gene is listed in brackets.

Deletion Genes File (del_genes.conf_##.txt, where ## is the confidence level)

The del genes file contains one column for each deletion peak identified in the GISTIC analysis. The file format for the del genes file is identical to the format for the amp genes file.

Gistic Scores File (scores.gistic)

The scores file lists the Q values [presented as -log10(q)], G scores, average amplitudes among aberrant samples, and frequency of aberration, across the genome for both amplifications and deletions. The scores file is viewable with the Genepattern SNPViewer module and may be imported into the Integrated Genomics Viewer (IGV).

Segmented Copy Number (raw_copy_number.{fig|pdf|png} )

The segmented copy number is a pdf file containing a colormap image of the segmented copy number profiles in the input data.

Amplification Score GISTIC plot (amp_qplot.{fig|pdf|png|v2.pdf})

The amplification pdf is a plot of the G scores (top) and Q values (bottom) with respect to amplifications for all markers over the entire region analyzed.

Deletion Score GISTIC plot (del_qplot.{fig|pdf|png|v2.pdf})

The deletion pdf is a plot of the G scores (top) and Q values (bottom) with respect to deletions for all markers over the entire region analyzed.

Tables (table_{amp|del}.conf_##.txt, where ## is the confidence level)

Tables of basic information about the genomic regions (peaks) that GISTIC determined to be significantly amplified or deleted. These describe three kinds of peak boundaries, and list the genes contained in two of them. The region start and region end columns (along with the chromosome column) delimit the entire area containing the peak that is above the significance level. The region may be the same for multiple peaks. The peak start and end delimit the maximum value of the peak. The extended peak is the peak determined by robust, and is contained within the wide peak reported in {amp|del}_genes.txt by one marker.

Broad Significance Results (broad_significance_results.txt)

A table of per-arm statistical results for the data set. Each arm is a row in the table. The first column specifies the arm and the second column counts the number of genes known to be on the arm. For both amplification and deletion, the table has columns for the frequency of amplification or deletion of the arm, and a Z score and Q value.

Broad Values By Arm (broad_values_by_arm.txt)

A table of chromosome arm amplification levels for each sample. Each row is a chromosome arm, and each column a sample. The data are in units of absolute copy number -2.

All Data By Genes (all_data_by_genes.txt)

A gene-level table of copy number values for all samples. Each row is the data for a gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. The copy number values in the table are in units of (copy number -2), so that no amplification or deletion is 0, genes with amplifications have positive values, and genes with deletions are negative values. The data are converted from marker level to gene level using the extreme method: a gene is assigned the greatest amplification or the least deletion value among the markers it covers.

Broad Data By Genes (broad_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only broad events with lengths greater than the broad length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

Focal Data By Genes (focal_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only focal events with lengths greater than the focal length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

All Thresholded By Genes (all_thresholded.by_genes.txt)

A gene-level table of discrete amplification and deletion indicators at for all samples. There is a row for each gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. A table value of 0 means no amplification or deletion above the threshold. Amplifications are positive numbers: 1 means amplification above the amplification threshold; 2 means amplifications larger to the arm level amplifications observed for the sample. Deletions are represented by negative table values: -1 represents deletion beyond the threshold; -2 means deletions greater than the minimum arm-level deletion observed for the sample.

Sample Cutoffs (sample_cutoffs.txt)

A table of the per-sample threshold cutoffs (in units of absolute copy number -2) used to distinguish the high level amplifications (+/-2) from ordinary amplifications (+/-1) in the all_thresholded.by_genes.txt output file. The table contains three columns: the sample identifier followed by the low (deletion) and high (amplification) cutoff values. The cutoffs are calculated as the minimum arm-level amplification level less the deletion threshold for deletions and the maximum arm-level amplification plus the amplification threshold for amplifications.

Focal Input To Gistic (focal_input.seg.txt)

A list of copy number segments describing just the focal events present in the data. The segment amplification/deletion levels are in units of (copy number -2), with amplifications positive and deletions negative numbers. This file may be viewed with IGV.

Gene Counts vs. Copy Number Alteration Frequency (freqarms_vs_ngenes.{fig|pdf})

An image showing the correlation between gene counts and frequency of copy number alterations.

Confidence Intervals (regions_track.conf_##.bed, where ## is the confidence level)

A file indicating the position of the confidence intervals around GISTIC peaks that can be loaded as a track in a compatible viewer browser such as IGV or the UCSC genome browser.

GISTIC

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. It takes segmented copy number ratios as input, separates arm-level events from focal events, and then performs two tests: (i) identifies significantly amplified/deleted chromosome arms; and (ii) identifies regions that are significantly focally amplified or deleted. For the focal analysis, the significance levels (Q values) are calculated by comparing the observed gains/losses at each locus to those obtained by randomly permuting the events along the genome to reflect the null hypothesis that they are all 'passengers' and could have occurred anywhere. The locus-specific significance levels are then corrected for multiple hypothesis testing. The arm-level significance is calculated by comparing the frequency of gains/losses of each arm to the expected rate given its size. The method outputs genomic views of significantly amplified and deleted regions, as well as a table of genes with gain or loss scores. A more in depth discussion of the GISTIC algorithm and its utility is given in [1], [3], and [5].

CNV Description

Regions of the genome that are prone to germ line variations in copy number are excluded from the GISTIC analysis using a list of germ line copy number variations (CNVs). A CNV is a DNA sequence that may be found at different copy numbers in the germ line of two different individuals. Such germ line variations can confound a GISTIC analysis, which finds significant somatic copy number variations in cancer. A more in depth discussion is provided in [6]. GISTIC currently uses two CNV exclusion lists. One is based on the literature describing copy number variation, and a second one comes from an analysis of significant variations among the blood normals in the TCGA data set.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References
[1] Beroukhim et al, Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma, Proc Natl Acad Sci U S A. Vol. 104:50 (2007)
[3] Mermel et al, GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers, Genome Biology Vol. 12:4 (2011)
[5] Beroukhim et al., The landscape of somatic copy-number alteration across human cancers, Nature Vol. 463:7283 (2010)
[6] McCarroll, S. A. et al., Integrated detection and population-genetic analysis of SNPs and copy number variation, Nat Genet Vol. 40(10):1166-1174 (2008)