LowPass Copy number analysis (GISTIC2)
Esophageal Carcinoma (Primary solid tumor)
21 August 2015  |  analyses__2015_08_21
Maintainer Information
Citation Information
Maintained by Spring Yingchun Liu (Broad Institute)
Cite as Broad Institute TCGA Genome Data Analysis Center (2015): LowPass Copy number analysis (GISTIC2). Broad Institute of MIT and Harvard. doi:10.7908/C10K27RF
Overview
Introduction

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. The pipeline first filters out normal samples from the segmented copy-number data by inspecting the TCGA barcodes and then executes GISTIC version 2.0.21 (Firehose task version: 127).

Summary

There were 51 tumor samples used in this analysis: 16 significant arm-level results, 4 significant focal amplifications, and 8 significant focal deletions were found.

Results
Focal results

Figure 1.  Genomic positions of amplified regions: the X-axis represents the normalized amplification signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 1.  Get Full Table Amplifications Table - 4 significant amplifications found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband Q value Residual Q value Wide Peak Boundaries # Genes in Wide Peak
11q13.3 1.3341e-28 1.3341e-28 chr11:69380205-70286916 10
7p11.2 0.00024857 0.00024857 chr7:54768280-55828923 6
8q24.21 0.0010659 0.0010659 chr8:126775116-129726430 14
7q21.3 0.16112 0.16112 chr7:84055159-109068633 262
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 11q13.3.

Table S1.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CCND1
MIR548K
FGF3
CTTN
FGF4
PPFIA1
FADD
FGF19
ANO1
ORAOV1
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 7p11.2.

Table S2.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
EGFR
SNORA73|ENSG00000252054.1
SEC61G
LANCL2
VOPP1
FKBP9L
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 8q24.21.

Table S3.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
MYC
RN7SKP226
TMEM75
POU5F1B
CASC8
CCAT1
PCAT2
PCAT1
LINC00861
FAM84B
MIR1205
MIR1207
PVT1
MIR1208
Genes in Wide Peak

This is the comprehensive list of amplified genes in the wide peak for 7q21.3.

Table S4.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CDK6
AKAP9
FLJ00325
U3|ENSG00000238297.1
snoU109|ENSG00000238832.1
RNA5SP236
SNORD112|ENSG00000251911.1
RN7SL8P
LHFPL3
RN7SKP86
RN7SKP198
FAM185A
POLR2J2
MIR4467
MIR5090
MIR4285
SH2B2
SNORA48|ENSG00000252824.1
RABL5
MIR4653
RN7SKP54
MUC3A
MUC3A
RN7SL549P
RN7SL750P
SAP25
RN7SL416P
RN7SL161P
STAG3L5P
PILRB
GATS
MIR4658
MIR106B
MIR93
MIR25
AZGP1P1
SNORA40|ENSG00000222966.1
CYP3A4
snoU13|ENSG00000239133.1
MYH16
snoU13|ENSG00000238459.1
MIR3609
RN7SL13P
RN7SL478P
RN7SKP104
RN7SL252P
SHFM1
MIR591
PON1
snoU13|ENSG00000238384.1
RN7SKP129
GNG11
MIR489
MIR653
RN7SL7P
GATAD1
snoU13|ENSG00000238739.1
DPY19L2P4
SNORA67|ENSG00000207094.1
snoU13|ENSG00000238587.1
LINC00972
SEMA3A
ACHE
ASNS
AZGP1
CALCR
KRIT1
AP1S1
COL1A2
CUX1
CYP3A7
CYP3A5
CYP51A1
DLD
DLX5
DLX6
DYNC1I1
SLC26A3
EPHB4
EPO
GNB2
GNGT1
GPR22
GRM3
AGFG2
LAMB1
LRCH4
MCM7
DNAJB9
NPTX2
NRCAM
OCM2
ORC5
SERPINE1
PCOLCE
PDK4
SLC26A4
PEX1
CDK14
ABCB1
ABCB4
PIK3CG
POLR2J
PON2
PON3
PRKAR2B
RELN
PSMC2
SRI
SRPK2
SYPL1
TAC1
TAF6
TFR2
TRIP6
VGF
ZAN
ZNF3
ZKSCAN1
ZSCAN21
MTERF
TFPI2
TRRAP
FZD1
BUD31
SGCE
PLOD3
CLDN12
AP4M1
PMPCB
ATP5J2
DMTF1
MUC12
ARPC1B
NAMPT
RASA4
SLC25A13
LRRC17
POP7
BET1
COG5
ZNHIT1
ARPC1A
STAG3
CPSF4
DBF4
COPS6
DUS4L
TP53TG1
PDAP1
LAMB4
LMTK2
PEG10
ZKSCAN5
CLDN15
BRI3
TECPR1
PTCD1
FBXO24
STEAP1
HBP1
DNAJC2
PILRA
PNPLA8
FIS1
ACTL6B
SRRT
ASB4
ADAM22
ANKIB1
PUS7
CROT
ALKBH4
SAMD9
ZCWPW1
C7orf43
PPP1R9A
CCDC132
KMT2E
BAIAP2L1
SLC25A40
BCAP29
MEPCE
SLC12A9
ACN9
SMURF1
RINT1
MOSPD3
GIGYF1
CYP3A43
CASD1
ZNF655
PVRIG
TMEM243
STEAP4
GAL3ST4
PRKRIP1
C7orf63
CBLL1
ORAI2
OR2AE1
TSC22D4
TRIM56
ARMC10
RBM48
ZNF394
GTPBP10
TRIM4
MYL10
COL26A1
MUC17
RUNDC3B
C7orf66
THAP5
CCDC71L
BHLHA15
SAMD9L
C7orf62
ZNF804B
ZSCAN25
FAM200A
PPP1R35
GPC2
KIAA1324L
LRWD1
FBXL13
NAPEPLD
ATXN7L1
CDHR3
TMEM130
NYAP1
SEMA3D
CNPY4
POLR2J2
HEPACAM2
MBLAC1
FAM133B
STEAP2
ZNF789
MOGAT3
GJC3
DPY19L2P2
GATS
NAT16
SLC26A5
LAMTOR4
LRRD1
C7orf76
KPNA7
C7orf61
UFSP1
SPDYE3
SPDYE2
POLR2J3
EFCAB10
UPK3BL
RASA4B
SPDYE2B
MIR548O
MIR4652
MIR5692C2
MIR5692A1

Figure 2.  Genomic positions of deleted regions: the X-axis represents the normalized deletion signals (top) and significance by Q value (bottom). The green line represents the significance cutoff at Q value=0.25.

Table 2.  Get Full Table Deletions Table - 8 significant deletions found. Click the link in the last column to view a comprehensive list of candidate genes. If no genes were identified within the peak, the nearest gene appears in brackets.

Cytoband Q value Residual Q value Wide Peak Boundaries # Genes in Wide Peak
9p21.3 8.293e-33 8.293e-33 chr9:21959090-22010600 3
7q36.1 0.0014602 0.0014602 chr7:143818945-159138663 124
16q23.1 0.0014602 0.0014602 chr16:78589811-78992298 1
2q22.2 0.016553 0.079326 chr2:142380819-142702424 1
2q22.1 0.014855 0.10923 chr2:141742653-142097440 1
10p11.21 0.18592 0.18592 chr10:34506934-34848990 1
11q25 0.19993 0.19993 chr11:74482328-135006516 483
6p25.3 0.22288 0.22288 chr6:1-5056186 38
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 9p21.3.

Table S5.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
CDKN2A
CDKN2B
C9orf53
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 7q36.1.

Table S6.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
EZH2
LINC00689
MIR595
RN7SL142P
SHH
RN7SKP280
HTR5A
RN7SL845P
RN7SL811P
SNORA26|ENSG00000212590.1
snoU13|ENSG00000238557.1
RNA5SP250
FABP5P3
snoU13|ENSG00000239045.1
RN7SL76P
MIR3907
MIR671
IQCA1P1
CDK5
SSPO
SNORD112|ENSG00000252557.1
RN7SL521P
RNY1
RNY3
RNY4
RNY5
RN7SL569P
RN7SL72P
U3|ENSG00000199370.1
RN7SL456P
RNA5SP249
RN7SL207P
RN7SKP174
RNU6ATAC40P
OR2A9P
OR2A20P
AOC1
DPP6
EN2
GBX1
MNX1
INSIG1
KCNH2
NOS3
PTPRN2
RARRES2
RHEB
SLC4A2
SMARCD3
VIPR2
XRCC2
ARHGEF5
ZNF212
ZNF282
CUL1
ASIC3
PDIA4
UBE3C
DNAJB6
ABCF2
FASTK
ABCB8
PAXIP1
CNTNAP2
GIMAP2
TPK1
ZNF777
TMEM176B
REPIN1
PRKAG2
NUB1
CHPF2
NCAPG2
WDR60
GIMAP4
GIMAP5
TMEM176A
ACTR3B
ESYT2
ZNF398
KMT2C
GALNT11
LMBR1
NOM1
LRRC61
ZNF767
TMUB1
KRBA1
ZBED6CL
AGAP3
C7orf13
NOBOX
OR2A14
ZNF786
ASB10
RNF32
GIMAP8
CRYGN
ZNF425
ZNF746
ATP6V0E2
RBM33
GALNTL5
GIMAP7
ZNF467
GIMAP1
C7orf33
CNPY1
ZNF775
ATG9B
BLACE
OR2A1
WDR86
OR2A7
OR2A42
ARHGEF35
GIMAP6
ZNF862
ACTR3C
CTAGE4
CTAGE8
ZNF783
MIR548F4
MIR5707
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 16q23.1.

Table S7.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
WWOX
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 2q22.2.

Table S8.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
LRP1B
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 2q22.1.

Table S9.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
LRP1B
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 10p11.21.

Table S10.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
PARD3
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 11q25.

Table S11.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
DDX6
PCSK7
SDHD
MAML2
BIRC3
ATM
CBL
DDX10
FLI1
PAFAH1B2
POU2AF1
SDHD
PICALM
ARHGEF12
snoU13|ENSG00000238693.1
RNU6ATAC12P
RN7SL167P
LINC00167
KCNJ5
RN7SKP279
RN7SKP121
MIR3167
snoU13|ENSG00000238855.1
RN7SL351P
KRT18P59
SLC37A2
RNA5SP352
TBRG1
OR10D3
U8|ENSG00000200496.1
SNORD14C
SNORD14D
SNORD14E
snoU13|ENSG00000239079.1
RNU4ATAC5P
RNU4ATAC10P
SC5D
TBCEL
OAF
THY1
MFRP
ACA64|ENSG00000252119.1
HINFP
C2CD2L
MIR3656
RPS25
RN7SL529P
RN7SL688P
BCL9L
CXCR5
TTC36
RN7SL86P
CD3G
MPZL3
TMPRSS4
SCARNA11|ENSG00000252992.1
RNY4P6
ZNF259
snoU13|ENSG00000238625.1
LINC00900
snoU13|ENSG00000239153.1
ACA59|ENSG00000252870.1
snoU13|ENSG00000238724.1
ATF4P4
snosnR66
C11orf34
RNA5SP351
HSPB2
ALG9
ALG9
RN7SKP273
SIK2
RNA5SP350
SNORD39|ENSG00000264997.1
RNA5SP349
RNA5SP348
MMP12
WTAPP1
snoU13|ENSG00000239154.1
snoU13|ENSG00000252679.1
MIR3920
snoU13|ENSG00000238388.1
RN7SKP115
RN7SL222P
RN7SKP53
RNA5SP347
RNA5SP346
RNA5SP345
SRSF8
SRSF8
MIR548L
VSTM5
RN7SL195P
snoU13|ENSG00000238437.1
MED17
SNORA40|ENSG00000210825.1
SNORA18|ENSG00000207145.1
SNORD5|ENSG00000239195.1
SNORA8|ENSG00000207304.1
SNORA1|ENSG00000206834.1
SNORD6
SNORA32|ENSG00000206799.1
SNORA25|ENSG00000207112.1
SCARNA9
RN7SL223P
SLC36A4
snoU13|ENSG00000239086.1
DISC1FP1
CHORDC1
SNORD56|ENSG00000207299.1
TRIM49D1
TRIM64B
TRIM77
FOLH1B
GRM5
TMEM135
RN7SL225P
snoU13|ENSG00000238666.1
PCF11
SNORA70E
snoU13|ENSG00000238995.1
SNORD112|ENSG00000252592.1
DKFZP434E1119
ACER3
GUCY2EP
RNA5SP344
UVRAG
RN7SL786P
MOGAT2
SNORD15B
SNORD15A
MIR326
NEU3
RN7SL239P
ACAT1
ACRV1
BIRC2
APLP2
APOA1
APOA4
APOC3
ARCN1
ARRB1
FXYD2
CAPN5
CASP1
CASP4
CASP5
SERPINH1
CD3D
CD3E
CTSC
CHEK1
CLNS1A
CRYAB
DLAT
DLG2
DPAGT1
DRD2
ETS1
FDX1
FUT4
SLC37A4
LRRC32
GRIA4
GRIK4
GUCY1A2
H2AFX
HMBS
HSPA8
HTR3A
IL10RA
IL18
STT3A
KCNJ1
VWA5A
MAP6
MCAM
KMT2A
MMP1
MMP3
MMP7
MMP8
MMP10
MMP13
MRE11A
MTNR1B
MYO7A
NCAM1
NDUFC2
NFRKB
NNMT
NPAT
NRGN
OMP
OPCML
PAK1
PGR
PPP2R1B
PRCP
PRKRIR
PTS
PVRL1
RDX
RPS3
SCN2B
SCN4B
ST3GAL4
SLN
SORL1
SRPR
ST14
TAGLN
TECTA
THRSP
TRPC6
TYR
UPK2
WNT11
ZBTB16
ZNF202
CUL5
FZD4
BARX2
JRKL
EED
MTMR2
USP2
HTR3B
ZW10
MMP20
UBE4A
EI24
FEZ1
CEP57
ARHGAP32
SPCS2
GAB2
NAALAD2
RBM7
MPZL2
YAP1
HYOU1
ATP5L
ME3
GPR83
ADAMTS8
PRSS23
TREH
SLCO2B1
CEP164
IGSF9B
ENDOD1
EXPH5
PHLDB1
NCAPD3
SIK3
VSIG2
BACE1
TRIM29
RAB38
CADM1
PANX1
POU2F3
TSKU
REXO2
TENM4
OR8B8
TIMM8B
OR8B2
ACAD8
B3GAT1
RAB30
DCPS
C11orf54
AAMDC
ZBTB44
THYN1
DDX25
NOX4
NTM
CDON
SIDT2
TRAPPC4
C11orf73
CWC15
RSF1
SPA17
FXYD6
CNTN5
SIAE
C11orf71
ROBO4
SLC35F2
RAB39A
BTG4
NXPE4
SYTL2
ANKRD49
TTC12
C11orf57
ELMOD1
FOXRED1
KDM4D
SCN3B
VPS11
TMEM126B
TEX12
CRTAM
IFT46
SMCO4
C11orf30
PRDM10
TRIM49
DSCAML1
GRAMD1B
USP35
KIAA1377
ARHGAP20
USP28
CREBZF
CARD18
CCDC90B
CCDC81
AASDHPPT
PKNOX2
TP53AIP1
MMP27
ABCG4
ROBO3
C11orf1
KCTD14
ALG8
TAF1D
RNF26
FAM118B
DYNC2H1
NLRX1
MSANTD2
NARS2
CCDC82
CLMP
PDZD3
C11orf63
CCDC15
PDGFD
TMPRSS5
GDPD5
PUS3
MFRP
JAM3
BCO2
TMEM133
TMPRSS13
TMEM126A
DCUN1D5
MSANTD4
KIRREL3
DGAT2
BUD13
TMEM25
RPUSD4
UBASH3B
C11orf70
DIXDC1
KIAA1731
ZC3H12C
GLB1L2
ESAM
ALKBH8
FDXACB1
C11orf52
INTS4
VPS26B
GLB1L3
TIRAP
CARD16
C1QTNF5
TMEM123
PANX3
APOA5
FAT3
TMEM45B
C11orf93
PIH1D2
NXPE1
NXPE2
AMICA1
XRRA1
FAM76B
SESN3
PIWIL4
ARHGAP42
KBTBD3
CWF19L2
KDELC2
LAYN
AMOTL1
CCDC67
PATE1
C11orf65
ADAMTS15
B3GNT6
C11orf45
HYLS1
TMEM218
OR8B12
OR10G8
OR10G9
OR10S1
OR6T1
OR4D5
TMEM136
SPATA19
GDPD4
C11orf82
CCDC83
HEPACAM
FAM181B
CCDC89
ANGPTL5
RNF169
ANKK1
RNF214
AQP11
FOXR1
CCDC153
OR8D1
OR8D2
OR8B4
C11orf44
KLHL35
KCTD21
CCDC84
TMEM225
OR8D4
ANKRD42
C11orf53
OR2AT4
HEPHL1
FOLR4
KDM4E
BSX
OR6X1
OR6M1
OR10G4
OR10G7
OR8B3
OR8A1
C11orf87
C11orf92
C11orf88
PATE2
PATE4
SNX19
MIRLET7A2
MIR100
MIR125B1
MIR34B
MIR34C
DDI1
BLID
CARD17
HEPN1
TRIM49C
CLDN25
MIR708
PATE3
MIR1261
MIR1304
MIR4300
MIR4301
MIR1260B
MIR3166
CASP12
TPBGL
MIR4697
MIR4490
MIR4493
MIR4491
MIR4492
MIR4693
MIR5579
Genes in Wide Peak

This is the comprehensive list of deleted genes in the wide peak for 6p25.3.

Table S12.  Genes in bold are cancer genes as defined by The Sanger Institute: Cancer Gene Census[7].

Genes
IRF4
RMRPP2
snoU13|ENSG00000238801.1
RNA5SP202
snoU13|ENSG00000252668.1
C6ORF50
RNA5SP201
C6orf195
RN7SL352P
snoU13|ENSG00000238438.1
BPHL
SERPINB1
FOXF2
FOXC1
GMDS
NQO2
SERPINB6
SERPINB9
TUBB2A
RIPK1
PRPF4B
CDYL
ECI2
RPP40
FAM50B
EXOC2
WRNIP1
DUSP22
SLC22A23
FOXQ1
HUS1B
PXDC1
FAM217A
MYLK4
TUBB2B
PSMG4
C6orf201
MIR4645
Arm-level results

Table 3.  Get Full Table Arm-level significance table - 16 significant results found. The significance cutoff is at Q value=0.25.

Arm # Genes Amp Frequency Amp Z score Amp Q value Del Frequency Del Z score Del Q value
1p 1300 0.09 0.243 0.968 0.11 0.745 0.571
1q 1195 0.43 7.79 1.33e-13 0.16 1.23 0.364
2p 624 0.22 0.386 0.968 0.03 -2.75 0.997
2q 967 0.14 0.121 0.968 0.05 -1.76 0.997
3p 644 0.05 -1.56 0.996 0.64 7.95 3.55e-14
3q 733 0.53 5.59 2.24e-07 0.42 3.5 0.00236
4p 289 0.15 -1.4 0.996 0.39 2.06 0.0976
4q 670 0.08 -1.75 0.996 0.27 1.43 0.307
5p 183 0.40 1.73 0.208 0.24 -0.545 0.997
5q 905 0.03 -1.9 0.996 0.36 4.22 0.000242
6p 710 0.09 -1.63 0.996 0.13 -0.979 0.997
6q 556 0.05 -2.64 0.996 0.16 -0.809 0.997
7p 389 0.49 4 0.000256 0.14 -1.21 0.997
7q 783 0.44 5.09 1.8e-06 0.03 -1.94 0.997
8p 338 0.26 0.102 0.968 0.29 0.641 0.604
8q 551 0.53 5.29 8.07e-07 0.22 0.139 0.773
9p 301 0.07 -2.38 0.996 0.43 2.77 0.0226
9q 700 0.20 0.173 0.968 0.23 0.81 0.571
10p 253 0.16 -1.51 0.996 0.18 -1.24 0.997
10q 738 0.05 -2.11 0.996 0.22 0.854 0.571
11p 509 0.11 -1.63 0.996 0.30 1.31 0.345
11q 975 0.13 -0.133 0.996 0.26 2.48 0.0432
12p 339 0.41 2.57 0.0291 0.16 -1.22 0.997
12q 904 0.16 0.204 0.968 0.16 0.204 0.762
13q 560 0.14 -0.971 0.996 0.35 2.3 0.061
14q 938 0.15 0.196 0.968 0.11 -0.541 0.997
15q 810 0.09 -1.43 0.996 0.11 -1.08 0.997
16p 559 0.12 -1.32 0.996 0.24 0.488 0.604
16q 455 0.13 -1.49 0.996 0.26 0.524 0.604
17p 415 0.12 -1.65 0.996 0.24 0.0563 0.796
17q 972 0.17 0.75 0.824 0.12 -0.378 0.997
18p 104 0.32 0.333 0.968 0.33 0.579 0.604
18q 275 0.09 -2.22 0.996 0.38 1.8 0.158
19p 681 0.07 -2.08 0.996 0.12 -1.12 0.997
19q 935 0.11 -0.711 0.996 0.09 -1.08 0.997
20p 234 0.37 1.56 0.266 0.15 -1.59 0.997
20q 448 0.40 2.86 0.014 0.03 -2.61 0.997
21q 258 0.12 -1.72 0.996 0.52 4.04 0.000351
22q 564 0.20 -0.139 0.996 0.26 0.756 0.571
Xq 668 0.24 0.788 0.824 0.22 0.475 0.604
Methods & Data
Input
Description
  • Segmentation File: The segmentation file contains the segmented data for all the samples identified by GLAD, CBS, or some other segmentation algorithm. (See GLAD file format in the Genepattern file formats documentation.) It is a six column, tab-delimited file with an optional first line identifying the columns. Positions are in base pair units.The column headers are: (1) Sample (sample name), (2) Chromosome (chromosome number), (3) Start Position (segment start position, in bases), (4) End Position (segment end position, in bases), (5) Num markers (number of markers in segment), (6) Seg.CN (log2() -1 of copy number).

  • Markers File: The markers file identifies the marker names and positions of the markers in the original dataset (before segmentation). It is a three column, tab-delimited file with an optional header. The column headers are: (1) Marker Name, (2) Chromosome, (3) Marker Position (in bases).

  • Reference Genome: The reference genome file contains information about the location of genes and cytobands on a given build of the genome. Reference genome files are created in Matlab and are not viewable with a text editor.

  • CNV Files: There are two options for the cnv file. The first option allows CNVs to be identified by marker name. The second option allows the CNVs to be identified by genomic location. Option #1: A two column, tab-delimited file with an optional header row. The marker names given in this file must match the marker names given in the markers file. The CNV identifiers are for user use and can be arbitrary. The column headers are: (1) Marker Name, (2) CNV Identifier. Option #2: A 6 column, tab-delimited file with an optional header row. The 'CNV Identifier' is for user use and can be arbitrary. 'Narrow Region Start' and 'Narrow Region End' are also not used. The column headers are: (1) CNV Identifier, (2) Chromosome, (3) Narrow Region Start, (4) Narrow Region End, (5) Wide Region Start, (6) Wide Region End

  • Amplification Threshold: Threshold for copy number amplifications. Regions with a log2 ratio above this value are considered amplified.

  • Deletion Threshold: Threshold for copy number deletions. Regions with a log2 ratio below the negative of this value are considered deletions.

  • Cap Values: Minimum and maximum cap values on analyzed data. Regions with a log2 ratio greater than the cap are set to the cap value; regions with a log2 ratio less than -cap value are set to -cap. Values must be positive.

  • Broad Length Cutoff: Threshold used to distinguish broad from focal events, given in units of fraction of chromosome arm.

  • Remove X-Chromosome: Flag indicating whether to remove data from the X-chromosome before analysis. Allowed values= {1,0} (1: Remove X-Chromosome, 0: Do not remove X-Chromosome.

  • Confidence Level: Confidence level used to calculate the region containing a driver.

  • Join Segment Size: Smallest number of markers to allow in segments from the segmented data. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number.

  • Arm Level Peel Off: Flag set to enable arm-level peel-off of events during peak definition. The arm-level peel-off enhancement to the arbitrated peel-off method assigns all events in the same chromosome arm of the same sample to a single peak. It is useful when peaks are split by noise or chromothripsis. Allowed values= {1,0} (1: Use arm level peel off, 0: Use normal arbitrated peel-off).

  • Maximum Sample Segments: Maximum number of segments allowed for a sample in the input data. Samples with more segments than this threshold are excluded from the analysis.

  • Gene GISTIC: When enabled (value = 1), this option causes GISTIC to analyze deletions using genes instead of array markers to locate the lesion. In this mode, the copy number assigned to a gene is the lowest copy number among the markers that represent the gene.

Values

List of inputs used for this run of GISTIC2. All files listed should be included in the archived results.

  • Segmentation File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/PrepareGisticDNASeq/ESCA-TP/19775718/segmentationfile.txt

  • Markers File = /xchip/cga/gdac-prod/tcga-gdac/jobResults/PrepareGisticDNASeq/ESCA-TP/19775718/markersfile.txt

  • Reference Genome = /xchip/cga/reference/gistic2/hg19_GENCODE_v18_20140127.mat

  • CNV Files = /xchip/cga/reference/gistic2/CNV.hg19.bypos.111213.txt

  • Amplification Threshold = 0.3

  • Deletion Threshold = 0.3

  • Cap Values = 2

  • Broad Length Cutoff = 0.5

  • Remove X-Chromosome = 0

  • Confidence Level = 0.99

  • Join Segment Size = 10

  • Arm Level Peel Off = 1

  • Maximum Sample Segments = 10000

  • Gene GISTIC = 0

Table 4.  Get Full Table First 10 out of 51 Input Tumor Samples.

Tumor Sample Names
TCGA-IG-A3I8-01A-11D-A248-26
TCGA-IG-A3QL-01A-11D-A248-26
TCGA-IG-A3Y9-01A-12D-A248-26
TCGA-IG-A3YA-01A-11D-A248-26
TCGA-IG-A3YB-01A-11D-A248-26
TCGA-IG-A3YC-01A-11D-A248-26
TCGA-IG-A4P3-01A-11D-A267-26
TCGA-IG-A4QT-01A-21D-A267-26
TCGA-IG-A50L-01A-11D-A267-26
TCGA-IG-A51D-01A-11D-A267-26

Figure 3.  Segmented copy number profiles in the input data

Output
All Lesions File (all_lesions.conf_##.txt, where ## is the confidence level)

The all lesions file summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row, starting in column 10.

Region Data

Columns 1-9 present the data about the significant regions as follows:

  1. Unique Name: A name assigned to identify the region.

  2. Descriptor: The genomic descriptor of that region.

  3. Wide Peak Limits: The 'wide peak' boundaries most likely to contain the targeted genes. These are listed in genomic coordinates and marker (or probe) indices.

  4. Peak Limits: The boundaries of the region of maximal amplification or deletion.

  5. Region Limits: The boundaries of the entire significant region of amplification or deletion.

  6. Q values: The Q value of the peak region.

  7. Residual Q values: The Q value of the peak region after removing ('peeling off') amplifications or deletions that overlap other, more significant peak regions in the same chromosome.

  8. Broad or Focal: Identifies whether the region reaches significance due primarily to broad events (called 'broad'), focal events (called 'focal'), or independently significant broad and focal events (called 'both').

  9. Amplitude Threshold: Key giving the meaning of values in the subsequent columns associated with each sample.

Sample Data

Each of the analyzed samples is represented in one of the columns following the lesion data (columns 10 through end). The data contained in these columns varies slightly by section of the file. The first section can be identified by the key given in column 9 - it starts in row 2 and continues until the row that reads 'Actual Copy Change Given.' This section contains summarized data for each sample. A '0' indicates that the copy number of the sample was not amplified or deleted beyond the threshold amount in that peak region. A '1' indicates that the sample had low-level copy number aberrations (exceeding the low threshold indicated in column 9), and a '2' indicates that the sample had high-level copy number aberrations (exceeding the high threshold indicated in column 9).The second section can be identified the rows in which column 9 reads 'Actual Copy Change Given.' The second section exactly reproduces the first section, except that here the actual changes in copy number are provided rather than zeroes, ones, and twos.The final section is similar to the first section, except that here only broad events are included. A 1 in the samples columns (columns 10+) indicates that the median copy number of the sample across the entire significant region exceeded the threshold given in column 9. That is, it indicates whether the sample had a geographically extended event, rather than a focal amplification or deletion covering little more than the peak region.

Amplification Genes File (amp_genes.conf_##.txt, where ## is the confidence level)

The amp genes file contains one column for each amplification peak identified in the GISTIC analysis. The first four rows are:

  1. Cytoband

  2. Q value

  3. Residual Q value

  4. Wide Peak Boundaries

These rows identify the lesion in the same way as the all lesions file.The remaining rows list the genes contained in each wide peak. For peaks that contain no genes, the nearest gene is listed in brackets.

Deletion Genes File (del_genes.conf_##.txt, where ## is the confidence level)

The del genes file contains one column for each deletion peak identified in the GISTIC analysis. The file format for the del genes file is identical to the format for the amp genes file.

Gistic Scores File (scores.gistic)

The scores file lists the Q values [presented as -log10(q)], G scores, average amplitudes among aberrant samples, and frequency of aberration, across the genome for both amplifications and deletions. The scores file is viewable with the Genepattern SNPViewer module and may be imported into the Integrated Genomics Viewer (IGV).

Segmented Copy Number (raw_copy_number.{fig|pdf|png} )

The segmented copy number is a pdf file containing a colormap image of the segmented copy number profiles in the input data.

Amplification Score GISTIC plot (amp_qplot.{fig|pdf|png|v2.pdf})

The amplification pdf is a plot of the G scores (top) and Q values (bottom) with respect to amplifications for all markers over the entire region analyzed.

Deletion Score GISTIC plot (del_qplot.{fig|pdf|png|v2.pdf})

The deletion pdf is a plot of the G scores (top) and Q values (bottom) with respect to deletions for all markers over the entire region analyzed.

Tables (table_{amp|del}.conf_##.txt, where ## is the confidence level)

Tables of basic information about the genomic regions (peaks) that GISTIC determined to be significantly amplified or deleted. These describe three kinds of peak boundaries, and list the genes contained in two of them. The region start and region end columns (along with the chromosome column) delimit the entire area containing the peak that is above the significance level. The region may be the same for multiple peaks. The peak start and end delimit the maximum value of the peak. The extended peak is the peak determined by robust, and is contained within the wide peak reported in {amp|del}_genes.txt by one marker.

Broad Significance Results (broad_significance_results.txt)

A table of per-arm statistical results for the data set. Each arm is a row in the table. The first column specifies the arm and the second column counts the number of genes known to be on the arm. For both amplification and deletion, the table has columns for the frequency of amplification or deletion of the arm, and a Z score and Q value.

Broad Values By Arm (broad_values_by_arm.txt)

A table of chromosome arm amplification levels for each sample. Each row is a chromosome arm, and each column a sample. The data are in units of absolute copy number -2.

All Data By Genes (all_data_by_genes.txt)

A gene-level table of copy number values for all samples. Each row is the data for a gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. The copy number values in the table are in units of (copy number -2), so that no amplification or deletion is 0, genes with amplifications have positive values, and genes with deletions are negative values. The data are converted from marker level to gene level using the extreme method: a gene is assigned the greatest amplification or the least deletion value among the markers it covers.

Broad Data By Genes (broad_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only broad events with lengths greater than the broad length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

Focal Data By Genes (focal_data_by_genes.txt)

A gene-level table of copy number data similar to the all_data_by_genes.txt output, but using only focal events with lengths greater than the focal length cutoff. The structure of the file and the methods and units used for the data analysis are otherwise identical to all_data_by_genes.txt.

All Thresholded By Genes (all_thresholded.by_genes.txt)

A gene-level table of discrete amplification and deletion indicators at for all samples. There is a row for each gene. The first three columns name the gene, its NIH locus ID, and its cytoband - the remaining columns are the samples. A table value of 0 means no amplification or deletion above the threshold. Amplifications are positive numbers: 1 means amplification above the amplification threshold; 2 means amplifications larger to the arm level amplifications observed for the sample. Deletions are represented by negative table values: -1 represents deletion beyond the threshold; -2 means deletions greater than the minimum arm-level deletion observed for the sample.

Sample Cutoffs (sample_cutoffs.txt)

A table of the per-sample threshold cutoffs (in units of absolute copy number -2) used to distinguish the high level amplifications (+/-2) from ordinary amplifications (+/-1) in the all_thresholded.by_genes.txt output file. The table contains three columns: the sample identifier followed by the low (deletion) and high (amplification) cutoff values. The cutoffs are calculated as the minimum arm-level amplification level less the deletion threshold for deletions and the maximum arm-level amplification plus the amplification threshold for amplifications.

Focal Input To Gistic (focal_input.seg.txt)

A list of copy number segments describing just the focal events present in the data. The segment amplification/deletion levels are in units of (copy number -2), with amplifications positive and deletions negative numbers. This file may be viewed with IGV.

Gene Counts vs. Copy Number Alteration Frequency (freqarms_vs_ngenes.{fig|pdf})

An image showing the correlation between gene counts and frequency of copy number alterations.

Confidence Intervals (regions_track.conf_##.bed, where ## is the confidence level)

A file indicating the position of the confidence intervals around GISTIC peaks that can be loaded as a track in a compatible viewer browser such as IGV or the UCSC genome browser.

GISTIC

GISTIC identifies genomic regions that are significantly gained or lost across a set of tumors. It takes segmented copy number ratios as input, separates arm-level events from focal events, and then performs two tests: (i) identifies significantly amplified/deleted chromosome arms; and (ii) identifies regions that are significantly focally amplified or deleted. For the focal analysis, the significance levels (Q values) are calculated by comparing the observed gains/losses at each locus to those obtained by randomly permuting the events along the genome to reflect the null hypothesis that they are all 'passengers' and could have occurred anywhere. The locus-specific significance levels are then corrected for multiple hypothesis testing. The arm-level significance is calculated by comparing the frequency of gains/losses of each arm to the expected rate given its size. The method outputs genomic views of significantly amplified and deleted regions, as well as a table of genes with gain or loss scores. A more in depth discussion of the GISTIC algorithm and its utility is given in [1], [3], and [5].

CNV Description

Regions of the genome that are prone to germ line variations in copy number are excluded from the GISTIC analysis using a list of germ line copy number variations (CNVs). A CNV is a DNA sequence that may be found at different copy numbers in the germ line of two different individuals. Such germ line variations can confound a GISTIC analysis, which finds significant somatic copy number variations in cancer. A more in depth discussion is provided in [6]. GISTIC currently uses two CNV exclusion lists. One is based on the literature describing copy number variation, and a second one comes from an analysis of significant variations among the blood normals in the TCGA data set.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References
[1] Beroukhim et al, Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma, Proc Natl Acad Sci U S A. Vol. 104:50 (2007)
[3] Mermel et al, GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers, Genome Biology Vol. 12:4 (2011)
[5] Beroukhim et al., The landscape of somatic copy-number alteration across human cancers, Nature Vol. 463:7283 (2010)
[6] McCarroll, S. A. et al., Integrated detection and population-genetic analysis of SNPs and copy number variation, Nat Genet Vol. 40(10):1166-1174 (2008)