Correlation between gene methylation status and clinical features
Head and Neck Squamous Cell Carcinoma (Primary solid tumor)
28 January 2016  |  analyses__2016_01_28
Maintainer Information
Citation Information
Maintained by Juok Cho (Broad Institute)
Cite as Broad Institute TCGA Genome Data Analysis Center (2016): Correlation between gene methylation status and clinical features. Broad Institute of MIT and Harvard. doi:10.7908/C1G1606Q
Overview
Introduction

This pipeline uses various statistical tests to identify genes whose promoter methylation levels correlated to selected clinical features. The input file "HNSC-TP.meth.by_min_clin_corr.data.txt" is generated in the pipeline Methylation_Preprocess in stddata run.

Summary

Testing the association between 17128 genes and 14 clinical features across 528 samples, statistically thresholded by P value < 0.05 and Q value < 0.3, 12 clinical features related to at least one genes.

  • 30 genes correlated to 'DAYS_TO_DEATH_OR_LAST_FUP'.

    • CDK5R1 ,  DAD1 ,  GZMM ,  CHAC2 ,  SPIB ,  ...

  • 30 genes correlated to 'YEARS_TO_BIRTH'.

    • KLF14 ,  KIF15 ,  SLC15A3 ,  FIGNL1 ,  IFT140 ,  ...

  • 30 genes correlated to 'PATHOLOGIC_STAGE'.

    • PTK2 ,  LOC284688 ,  CCL11 ,  UTP3 ,  NDUFB9 ,  ...

  • 30 genes correlated to 'PATHOLOGY_T_STAGE'.

    • IRX2 ,  ABL2 ,  SMURF1 ,  HMOX2 ,  MAP4K4 ,  ...

  • 30 genes correlated to 'PATHOLOGY_N_STAGE'.

    • SLC47A2 ,  C19ORF66 ,  DSG1 ,  HCST ,  AVPI1 ,  ...

  • 30 genes correlated to 'GENDER'.

    • KIF4B ,  FRG1B ,  MRPL32 ,  NCRNA00116 ,  NPDC1 ,  ...

  • 30 genes correlated to 'RADIATION_THERAPY'.

    • IGSF9 ,  SDHD ,  NCAPH ,  LRWD1 ,  C19ORF25 ,  ...

  • 30 genes correlated to 'HISTOLOGICAL_TYPE'.

    • CALML3 ,  ARHGAP23 ,  KIRREL2 ,  C15ORF56 ,  ARSG ,  ...

  • 30 genes correlated to 'NUMBER_PACK_YEARS_SMOKED'.

    • ZMYND8 ,  SCN9A ,  THAP2 ,  CAND1 ,  ARMC7 ,  ...

  • 30 genes correlated to 'YEAR_OF_TOBACCO_SMOKING_ONSET'.

    • KLF14 ,  RRP15 ,  TIGD4 ,  LSM7 ,  INSM2 ,  ...

  • 30 genes correlated to 'NUMBER_OF_LYMPH_NODES'.

    • HSPB3 ,  SLC13A4 ,  FUT7 ,  NBLA00301 ,  DSG1 ,  ...

  • 30 genes correlated to 'RACE'.

    • SCAMP5 ,  GPBAR1 ,  WBSCR27 ,  TP53 ,  ATAD3A ,  ...

  • No genes correlated to 'PATHOLOGY_M_STAGE', and 'ETHNICITY'.

Results
Overview of the results

Complete statistical result table is provided in Supplement Table 1

Table 1.  Get Full Table This table shows the clinical features, statistical methods used, and the number of genes that are significantly associated with each clinical feature at P value < 0.05 and Q value < 0.3.

Clinical feature Statistical test Significant genes Associated with                 Associated with
DAYS_TO_DEATH_OR_LAST_FUP Cox regression test N=30   N=NA   N=NA
YEARS_TO_BIRTH Spearman correlation test N=30 older N=26 younger N=4
PATHOLOGIC_STAGE Kruskal-Wallis test N=30        
PATHOLOGY_T_STAGE Spearman correlation test N=30 higher stage N=0 lower stage N=30
PATHOLOGY_N_STAGE Spearman correlation test N=30 higher stage N=30 lower stage N=0
PATHOLOGY_M_STAGE Wilcoxon test   N=0        
GENDER Wilcoxon test N=30 male N=30 female N=0
RADIATION_THERAPY Wilcoxon test N=30 yes N=30 no N=0
HISTOLOGICAL_TYPE Kruskal-Wallis test N=30        
NUMBER_PACK_YEARS_SMOKED Spearman correlation test N=30 higher number_pack_years_smoked N=5 lower number_pack_years_smoked N=25
YEAR_OF_TOBACCO_SMOKING_ONSET Spearman correlation test N=30 higher year_of_tobacco_smoking_onset N=8 lower year_of_tobacco_smoking_onset N=22
NUMBER_OF_LYMPH_NODES Spearman correlation test N=30 higher number_of_lymph_nodes N=30 lower number_of_lymph_nodes N=0
RACE Kruskal-Wallis test N=30        
ETHNICITY Wilcoxon test   N=0        
Clinical variable #1: 'DAYS_TO_DEATH_OR_LAST_FUP'

30 genes related to 'DAYS_TO_DEATH_OR_LAST_FUP'.

Table S1.  Basic characteristics of clinical feature: 'DAYS_TO_DEATH_OR_LAST_FUP'

DAYS_TO_DEATH_OR_LAST_FUP Duration (Months) 0.1-211 (median=21.2)
  censored N = 304
  death N = 223
     
  Significant markers N = 30
  associated with shorter survival NA
  associated with longer survival NA
List of top 10 genes differentially expressed by 'DAYS_TO_DEATH_OR_LAST_FUP'

Table S2.  Get Full Table List of top 10 genes significantly associated with 'Time to Death' by Cox regression test. For the survival curves, it compared quantile intervals at c(0, 0.25, 0.50, 0.75, 1) and did not try survival analysis if there is only one interval.

logrank_P Q C_index
CDK5R1 5.24e-06 0.04 0.546
DAD1 6.43e-06 0.04 0.419
GZMM 6.99e-06 0.04 0.601
CHAC2 1.34e-05 0.057 0.407
SPIB 2.37e-05 0.081 0.599
FTCD 3.35e-05 0.083 0.594
ESCO2 3.95e-05 0.083 0.576
AK3L1 4.01e-05 0.083 0.405
LTK 5.11e-05 0.083 0.552
CCKBR 5.38e-05 0.083 0.561
Clinical variable #2: 'YEARS_TO_BIRTH'

30 genes related to 'YEARS_TO_BIRTH'.

Table S3.  Basic characteristics of clinical feature: 'YEARS_TO_BIRTH'

YEARS_TO_BIRTH Mean (SD) 60.91 (12)
  Significant markers N = 30
  pos. correlated 26
  neg. correlated 4
List of top 10 genes differentially expressed by 'YEARS_TO_BIRTH'

Table S4.  Get Full Table List of top 10 genes significantly correlated to 'YEARS_TO_BIRTH' by Spearman correlation test

SpearmanCorr corrP Q
KLF14 0.312 2.3e-13 3.94e-09
KIF15 0.2661 5.448e-10 4.67e-06
SLC15A3 0.2284 1.145e-07 0.000523
FIGNL1 0.2279 1.221e-07 0.000523
IFT140 0.2125 8.503e-07 0.00291
TRIM45 0.2097 1.194e-06 0.00291
PRR18 0.2089 1.312e-06 0.00291
NAPEPLD 0.2086 1.361e-06 0.00291
GBA -0.2019 2.983e-06 0.00432
NRIP3 0.2015 3.128e-06 0.00432
Clinical variable #3: 'PATHOLOGIC_STAGE'

30 genes related to 'PATHOLOGIC_STAGE'.

Table S5.  Basic characteristics of clinical feature: 'PATHOLOGIC_STAGE'

PATHOLOGIC_STAGE Labels N
  STAGE I 27
  STAGE II 77
  STAGE III 82
  STAGE IVA 257
  STAGE IVB 12
  STAGE IVC 1
     
  Significant markers N = 30
List of top 10 genes differentially expressed by 'PATHOLOGIC_STAGE'

Table S6.  Get Full Table List of top 10 genes differentially expressed by 'PATHOLOGIC_STAGE'

kruskal_wallis_P Q
PTK2 1.289e-05 0.113
LOC284688 1.317e-05 0.113
CCL11 3.51e-05 0.188
UTP3 6.007e-05 0.188
NDUFB9 6.221e-05 0.188
PARL 6.57e-05 0.188
C8ORF44 7.741e-05 0.189
CCDC129 0.0001058 0.202
RUSC1 0.0001106 0.202
ZFYVE20 0.0001332 0.202
Clinical variable #4: 'PATHOLOGY_T_STAGE'

30 genes related to 'PATHOLOGY_T_STAGE'.

Table S7.  Basic characteristics of clinical feature: 'PATHOLOGY_T_STAGE'

PATHOLOGY_T_STAGE Mean (SD) 2.86 (1)
  N
  T0 1
  T1 49
  T2 140
  T3 101
  T4 175
     
  Significant markers N = 30
  pos. correlated 0
  neg. correlated 30
List of top 10 genes differentially expressed by 'PATHOLOGY_T_STAGE'

Table S8.  Get Full Table List of top 10 genes significantly correlated to 'PATHOLOGY_T_STAGE' by Spearman correlation test

SpearmanCorr corrP Q
IRX2 -0.2603 1.178e-08 0.000174
ABL2 -0.2562 2.029e-08 0.000174
SMURF1 -0.2513 3.835e-08 0.000219
HMOX2 -0.2474 6.257e-08 0.000268
MAP4K4 -0.2436 1.008e-07 0.000345
THEM5 -0.2361 2.527e-07 0.000721
CORO1C -0.2311 4.575e-07 0.00112
CCL4 -0.2254 8.876e-07 0.00181
SDCCAG8 -0.2245 9.775e-07 0.00181
OR1J2 -0.2234 1.104e-06 0.00181
Clinical variable #5: 'PATHOLOGY_N_STAGE'

30 genes related to 'PATHOLOGY_N_STAGE'.

Table S9.  Basic characteristics of clinical feature: 'PATHOLOGY_N_STAGE'

PATHOLOGY_N_STAGE Mean (SD) 1.02 (0.95)
  N
  N0 180
  N1 68
  N2 172
  N3 8
     
  Significant markers N = 30
  pos. correlated 30
  neg. correlated 0
List of top 10 genes differentially expressed by 'PATHOLOGY_N_STAGE'

Table S10.  Get Full Table List of top 10 genes significantly correlated to 'PATHOLOGY_N_STAGE' by Spearman correlation test

SpearmanCorr corrP Q
SLC47A2 0.2605 4.574e-08 0.000535
C19ORF66 0.2558 8.009e-08 0.000535
DSG1 0.2545 9.368e-08 0.000535
HCST 0.2355 8.557e-07 0.00366
AVPI1 0.2314 1.307e-06 0.00448
EFEMP1 0.2265 2.193e-06 0.00594
FUT7 0.2256 2.426e-06 0.00594
MMP21 0.2233 3.089e-06 0.00617
RAB1A 0.2228 3.241e-06 0.00617
GLB1L3 0.2189 4.857e-06 0.00769
Clinical variable #6: 'PATHOLOGY_M_STAGE'

No gene related to 'PATHOLOGY_M_STAGE'.

Table S11.  Basic characteristics of clinical feature: 'PATHOLOGY_M_STAGE'

PATHOLOGY_M_STAGE Labels N
  class0 191
  class1 1
     
  Significant markers N = 0
Clinical variable #7: 'GENDER'

30 genes related to 'GENDER'.

Table S12.  Basic characteristics of clinical feature: 'GENDER'

GENDER Labels N
  FEMALE 142
  MALE 386
     
  Significant markers N = 30
  Higher in MALE 30
  Higher in FEMALE 0
List of top 10 genes differentially expressed by 'GENDER'

Table S13.  Get Full Table List of top 10 genes differentially expressed by 'GENDER'. 0 significant gene(s) located in sex chromosomes is(are) filtered out.

W(pos if higher in 'MALE') wilcoxontestP Q AUC
KIF4B 8775 4.242e-33 7.27e-29 0.8399
FRG1B 12720 3.473e-21 2.97e-17 0.7679
MRPL32 13946 4.771e-18 2.72e-14 0.7456
NCRNA00116 16299 9e-13 3.85e-09 0.7026
NPDC1 16828 1.013e-11 3.47e-08 0.693
TSSK6 16936 2.02e-11 5.77e-08 0.6902
CCDC121 17069 2.938e-11 7.19e-08 0.6886
KRTCAP3 17126 3.767e-11 8.06e-08 0.6876
ZNF839 17470 1.641e-10 2.98e-07 0.6813
BRCA1 17484 1.74e-10 2.98e-07 0.681
Clinical variable #8: 'RADIATION_THERAPY'

30 genes related to 'RADIATION_THERAPY'.

Table S14.  Basic characteristics of clinical feature: 'RADIATION_THERAPY'

RADIATION_THERAPY Labels N
  NO 163
  YES 303
     
  Significant markers N = 30
  Higher in YES 30
  Higher in NO 0
List of top 10 genes differentially expressed by 'RADIATION_THERAPY'

Table S15.  Get Full Table List of top 10 genes differentially expressed by 'RADIATION_THERAPY'

W(pos if higher in 'YES') wilcoxontestP Q AUC
IGSF9 16823 1.368e-08 0.000234 0.6594
SDHD 30920 7.119e-06 0.0409 0.6261
NCAPH 18471 7.167e-06 0.0409 0.626
LRWD1 18615 1.161e-05 0.0421 0.6231
C19ORF25 18632 1.228e-05 0.0421 0.6228
APOBEC3C 18699 1.531e-05 0.0437 0.6214
GTPBP3 18786 2.031e-05 0.0473 0.6196
IL23A 18812 2.208e-05 0.0473 0.6191
NANP 18897 2.897e-05 0.0551 0.6174
OTUD7A 19047 4.637e-05 0.0666 0.6143
Clinical variable #9: 'HISTOLOGICAL_TYPE'

30 genes related to 'HISTOLOGICAL_TYPE'.

Table S16.  Basic characteristics of clinical feature: 'HISTOLOGICAL_TYPE'

HISTOLOGICAL_TYPE Labels N
  HEAD & NECK SQUAMOUS CELL CARCINOMA 517
  HEAD & NECK SQUAMOUS CELL CARCINOMA BASALOID TYPE 10
  HEAD & NECK SQUAMOUS CELL CARCINOMA, SPINDLE CELL VARIANT 1
     
  Significant markers N = 30
List of top 10 genes differentially expressed by 'HISTOLOGICAL_TYPE'

Table S17.  Get Full Table List of top 10 genes differentially expressed by 'HISTOLOGICAL_TYPE'

kruskal_wallis_P Q
CALML3 1.605e-05 0.0315
ARHGAP23 1.637e-05 0.0315
KIRREL2 2.064e-05 0.0315
C15ORF56 2.091e-05 0.0315
ARSG 2.111e-05 0.0315
RALGDS 2.133e-05 0.0315
HAUS8 2.163e-05 0.0315
RPP21 2.202e-05 0.0315
DHFR 2.513e-05 0.0315
E2F1 2.772e-05 0.0315
Clinical variable #10: 'NUMBER_PACK_YEARS_SMOKED'

30 genes related to 'NUMBER_PACK_YEARS_SMOKED'.

Table S18.  Basic characteristics of clinical feature: 'NUMBER_PACK_YEARS_SMOKED'

NUMBER_PACK_YEARS_SMOKED Mean (SD) 45.75 (35)
  Significant markers N = 30
  pos. correlated 5
  neg. correlated 25
List of top 10 genes differentially expressed by 'NUMBER_PACK_YEARS_SMOKED'

Table S19.  Get Full Table List of top 10 genes significantly correlated to 'NUMBER_PACK_YEARS_SMOKED' by Spearman correlation test

SpearmanCorr corrP Q
ZMYND8 -0.3002 1.272e-07 0.00218
SCN9A -0.2835 6.506e-07 0.00557
THAP2 -0.2767 1.23e-06 0.00702
CAND1 -0.2691 2.442e-06 0.0105
ARMC7 -0.2636 3.956e-06 0.0119
KLF6 0.2619 4.582e-06 0.0119
LIN28B -0.2625 4.882e-06 0.0119
LOC100287227 -0.2597 5.565e-06 0.0119
LIMD2 0.2554 8.035e-06 0.0124
ZFYVE1 -0.2548 8.449e-06 0.0124
Clinical variable #11: 'YEAR_OF_TOBACCO_SMOKING_ONSET'

30 genes related to 'YEAR_OF_TOBACCO_SMOKING_ONSET'.

Table S20.  Basic characteristics of clinical feature: 'YEAR_OF_TOBACCO_SMOKING_ONSET'

YEAR_OF_TOBACCO_SMOKING_ONSET Mean (SD) 1967.31 (13)
  Significant markers N = 30
  pos. correlated 8
  neg. correlated 22
List of top 10 genes differentially expressed by 'YEAR_OF_TOBACCO_SMOKING_ONSET'

Table S21.  Get Full Table List of top 10 genes significantly correlated to 'YEAR_OF_TOBACCO_SMOKING_ONSET' by Spearman correlation test

SpearmanCorr corrP Q
KLF14 -0.3856 1.97e-11 3.38e-07
RRP15 -0.2811 1.61e-06 0.0101
TIGD4 -0.2795 1.866e-06 0.0101
LSM7 -0.2764 2.44e-06 0.0101
INSM2 -0.2742 2.962e-06 0.0101
RPS2 -0.27 4.248e-06 0.0121
C17ORF55 -0.2643 6.839e-06 0.0136
CFH 0.2648 7.37e-06 0.0136
ZNF193 -0.2607 9.156e-06 0.0136
PPP1R12C -0.2601 9.668e-06 0.0136
Clinical variable #12: 'NUMBER_OF_LYMPH_NODES'

30 genes related to 'NUMBER_OF_LYMPH_NODES'.

Table S22.  Basic characteristics of clinical feature: 'NUMBER_OF_LYMPH_NODES'

NUMBER_OF_LYMPH_NODES Mean (SD) 2.19 (4.3)
  Significant markers N = 30
  pos. correlated 30
  neg. correlated 0
List of top 10 genes differentially expressed by 'NUMBER_OF_LYMPH_NODES'

Table S23.  Get Full Table List of top 10 genes significantly correlated to 'NUMBER_OF_LYMPH_NODES' by Spearman correlation test

SpearmanCorr corrP Q
HSPB3 0.2846 3.912e-09 6.7e-05
SLC13A4 0.267 3.592e-08 0.000291
FUT7 0.2628 5.929e-08 0.000291
NBLA00301 0.261 7.387e-08 0.000291
DSG1 0.2598 8.498e-08 0.000291
FAM185A 0.2557 1.373e-07 0.000392
P2RY6 0.2516 2.192e-07 0.000444
NRGN 0.2514 2.252e-07 0.000444
HCST 0.2511 2.331e-07 0.000444
MAFB 0.2498 2.716e-07 0.00046
Clinical variable #13: 'RACE'

30 genes related to 'RACE'.

Table S24.  Basic characteristics of clinical feature: 'RACE'

RACE Labels N
  AMERICAN INDIAN OR ALASKA NATIVE 2
  ASIAN 11
  BLACK OR AFRICAN AMERICAN 48
  WHITE 452
     
  Significant markers N = 30
List of top 10 genes differentially expressed by 'RACE'

Table S25.  Get Full Table List of top 10 genes differentially expressed by 'RACE'

kruskal_wallis_P Q
SCAMP5 3.306e-13 5.66e-09
GPBAR1 3.523e-10 3.02e-06
WBSCR27 1.104e-07 0.000526
TP53 1.227e-07 0.000526
ATAD3A 2.517e-07 0.000862
LOC100133161 4.142e-07 0.00115
EIF3D 4.696e-07 0.00115
TMTC3 1.23e-06 0.00263
LOC253039 1.853e-06 0.00347
RASIP1 2.028e-06 0.00347
Clinical variable #14: 'ETHNICITY'

No gene related to 'ETHNICITY'.

Table S26.  Basic characteristics of clinical feature: 'ETHNICITY'

ETHNICITY Labels N
  HISPANIC OR LATINO 26
  NOT HISPANIC OR LATINO 465
     
  Significant markers N = 0
Methods & Data
Input
  • Expresson data file = HNSC-TP.meth.by_min_clin_corr.data.txt

  • Clinical data file = HNSC-TP.merged_data.txt

  • Number of patients = 528

  • Number of genes = 17128

  • Number of clinical features = 14

Selected clinical features
  • Further details on clinical features selected for this analysis, please find a documentation on selected CDEs (Clinical Data Elements). The first column of the file is a formula to convert values and the second column is a clinical parameter name.

  • Survival time data

    • Survival time data is a combined value of days_to_death and days_to_last_followup. For each patient, it creates a combined value 'days_to_death_or_last_fup' using conversion process below.

      • if 'vital_status'==1(dead), 'days_to_last_followup' is always NA. Thus, uses 'days_to_death' value for 'days_to_death_or_fup'

      • if 'vital_status'==0(alive),

        • if 'days_to_death'==NA & 'days_to_last_followup'!=NA, uses 'days_to_last_followup' value for 'days_to_death_or_fup'

        • if 'days_to_death'!=NA, excludes this case in survival analysis and report the case.

      • if 'vital_status'==NA,excludes this case in survival analysis and report the case.

    • cf. In certain diesase types such as SKCM, days_to_death parameter is replaced with time_from_specimen_dx or time_from_specimen_procurement_to_death .

  • This analysis excluded clinical variables that has only NA values.

Survival analysis

For survival clinical features, logrank test in univariate Cox regression analysis with proportional hazards model (Andersen and Gill 1982) was used to estimate the P values comparing quantile intervals using the 'coxph' function in R. Kaplan-Meier survival curves were plotted using quantile intervals at c(0, 0.25, 0.50, 0.75, 1). If there is only one interval group, it will not try survival analysis.

Correlation analysis

For continuous numerical clinical features, Spearman's rank correlation coefficients (Spearman 1904) and two-tailed P values were estimated using 'cor.test' function in R

Wilcoxon rank sum test (Mann-Whitney U test)

For two groups (mutant or wild-type) of continuous type of clinical data, wilcoxon rank sum test (Mann and Whitney, 1947) was applied to compare their mean difference using 'wilcox.test(continuous.clinical ~ as.factor(group), exact=FALSE)' function in R. This test is equivalent to the Mann-Whitney test.

Q value calculation

For multiple hypothesis correction, Q value is the False Discovery Rate (FDR) analogue of the P value (Benjamini and Hochberg 1995), defined as the minimum FDR at which the test may be called significant. We used the 'Benjamini and Hochberg' method of 'p.adjust' function in R to convert P values into Q values.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References
[1] Andersen and Gill, Cox's regression model for counting processes, a large sample study, Annals of Statistics 10(4):1100-1120 (1982)
[2] Spearman, C, The proof and measurement of association between two things, Amer. J. Psychol 15:72-101 (1904)
[3] Mann and Whitney, On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other, Annals of Mathematical Statistics 18 (1), 50-60 (1947)
[4] Benjamini and Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B 59:289-300 (1995)