Correlation between mutation rate and clinical features
Kidney Chromophobe (Primary solid tumor)
15 July 2014  |  analyses__2014_07_15
Maintainer Information
Citation Information
Maintained by Juok Cho (Broad Institute)
Cite as Broad Institute TCGA Genome Data Analysis Center (2014): Correlation between mutation rate and clinical features. Broad Institute of MIT and Harvard. doi:10.7908/C1JW8CMN
Overview
Introduction

This pipeline uses various statistical tests to identify selected clinical features related to mutation rate.

Summary

Testing the association between 2 variables and 11 clinical features across 66 samples, statistically thresholded by P value < 0.05 and Q value < 0.3, 6 clinical features related to at least one variables.

  • 2 variables correlated to 'AGE'.

    • MUTATIONRATE_NONSYNONYMOUS ,  MUTATIONRATE_SILENT

  • 1 variable correlated to 'AGE_mutation.rate'.

    • MUTATIONRATE_NONSYNONYMOUS

  • 2 variables correlated to 'NEOPLASM.DISEASESTAGE'.

    • MUTATIONRATE_NONSYNONYMOUS ,  MUTATIONRATE_SILENT

  • 2 variables correlated to 'PATHOLOGY.T.STAGE'.

    • MUTATIONRATE_SILENT ,  MUTATIONRATE_NONSYNONYMOUS

  • 2 variables correlated to 'PATHOLOGY.N.STAGE'.

    • MUTATIONRATE_NONSYNONYMOUS ,  MUTATIONRATE_SILENT

  • 1 variable correlated to 'ETHNICITY'.

    • MUTATIONRATE_NONSYNONYMOUS

  • No variables correlated to 'PATHOLOGY.M.STAGE', 'GENDER', 'KARNOFSKY.PERFORMANCE.SCORE', 'NUMBERPACKYEARSSMOKED', and 'RACE'.

Results
Overview of the results

Complete statistical result table is provided in Supplement Table 1

Table 1.  Get Full Table This table shows the clinical features, statistical methods used, and the number of variables that are significantly associated with each clinical feature at P value < 0.05 and Q value < 0.3.

Clinical feature Statistical test Significant variables Associated with                 Associated with
AGE Spearman correlation test N=2 older N=2 younger N=0
AGE Linear Regression Analysis N=1        
NEOPLASM DISEASESTAGE Kruskal-Wallis test N=2        
PATHOLOGY T STAGE Spearman correlation test N=2 higher stage N=2 lower stage N=0
PATHOLOGY N STAGE Spearman correlation test N=2 higher stage N=2 lower stage N=0
PATHOLOGY M STAGE Kruskal-Wallis test   N=0        
GENDER Wilcoxon test   N=0        
KARNOFSKY PERFORMANCE SCORE Spearman correlation test   N=0        
NUMBERPACKYEARSSMOKED Spearman correlation test   N=0        
RACE Kruskal-Wallis test   N=0        
ETHNICITY Wilcoxon test N=1 not hispanic or latino N=1 hispanic or latino N=0
Clinical variable #1: 'AGE'

2 variables related to 'AGE'.

Table S1.  Basic characteristics of clinical feature: 'AGE'

AGE Mean (SD) 51.52 (14)
  Significant variables N = 2
  pos. correlated 2
  neg. correlated 0
List of 2 variables associated with 'AGE'

Table S2.  Get Full Table List of 2 variables significantly correlated to 'AGE' by Spearman correlation test

SpearmanCorr corrP Q
MUTATIONRATE_NONSYNONYMOUS 0.4443 0.0001863 0.000373
MUTATIONRATE_SILENT 0.3298 0.006846 0.00685
Clinical variable #2: 'AGE'

One variable related to 'AGE'.

Table S3.  Basic characteristics of clinical feature: 'AGE'

AGE Mean (SD) 51.52 (14)
  Significant variables N = 1
List of one variable associated with 'AGE'

Table S4.  Get Full Table List of one variable significantly correlated to 'AGE' by Linear regression analysis [lm (mutation rate ~ age)]. Compared to a correlation analysis testing for interdependence of the variables, a regression model attempts to describe the dependence of a variable on one (or more) explanatory variables assuming that there is a one-way causal effect from the explanatory variable(s) to the response variable. If 'Residuals vs Fitted' plot (a standard residual plot) shows a random pattern indicating a good fit for a linear model, it explains linear regression relationship between Mutation rate and age factor. Adj.R-squared (= Explained variation / Total variation) indicates regression model's explanatory power.

Adj.R.squared F P Residual.std.err DF coef(intercept) coef.p(intercept)
MUTATIONRATE_NONSYNONYMOUS 0.0443 4.01 0.0494 2.47e-06 64 4.3e-08 ( -4.64e-07 ) 0.0494 ( 0.687 )
Clinical variable #3: 'NEOPLASM.DISEASESTAGE'

2 variables related to 'NEOPLASM.DISEASESTAGE'.

Table S5.  Basic characteristics of clinical feature: 'NEOPLASM.DISEASESTAGE'

NEOPLASM.DISEASESTAGE Labels N
  STAGE I 21
  STAGE II 25
  STAGE III 14
  STAGE IV 6
     
  Significant variables N = 2
List of 2 variables associated with 'NEOPLASM.DISEASESTAGE'

Table S6.  Get Full Table List of 2 variables differentially expressed by 'NEOPLASM.DISEASESTAGE'

ANOVA_P Q
MUTATIONRATE_NONSYNONYMOUS 0.02758 0.035
MUTATIONRATE_SILENT 0.01748 0.035
Clinical variable #4: 'PATHOLOGY.T.STAGE'

2 variables related to 'PATHOLOGY.T.STAGE'.

Table S7.  Basic characteristics of clinical feature: 'PATHOLOGY.T.STAGE'

PATHOLOGY.T.STAGE Mean (SD) 2.02 (0.85)
  N
  1 21
  2 25
  3 18
  4 2
     
  Significant variables N = 2
  pos. correlated 2
  neg. correlated 0
List of 2 variables associated with 'PATHOLOGY.T.STAGE'

Table S8.  Get Full Table List of 2 variables significantly correlated to 'PATHOLOGY.T.STAGE' by Spearman correlation test

SpearmanCorr corrP Q
MUTATIONRATE_SILENT 0.3581 0.003157 0.00631
MUTATIONRATE_NONSYNONYMOUS 0.2891 0.01856 0.0186
Clinical variable #5: 'PATHOLOGY.N.STAGE'

2 variables related to 'PATHOLOGY.N.STAGE'.

Table S9.  Basic characteristics of clinical feature: 'PATHOLOGY.N.STAGE'

PATHOLOGY.N.STAGE Mean (SD) 0.16 (0.47)
  N
  0 40
  1 3
  2 2
     
  Significant variables N = 2
  pos. correlated 2
  neg. correlated 0
List of 2 variables associated with 'PATHOLOGY.N.STAGE'

Table S10.  Get Full Table List of 2 variables significantly correlated to 'PATHOLOGY.N.STAGE' by Spearman correlation test

SpearmanCorr corrP Q
MUTATIONRATE_NONSYNONYMOUS 0.4184 0.004231 0.00846
MUTATIONRATE_SILENT 0.354 0.01703 0.017
Clinical variable #6: 'PATHOLOGY.M.STAGE'

No variable related to 'PATHOLOGY.M.STAGE'.

Table S11.  Basic characteristics of clinical feature: 'PATHOLOGY.M.STAGE'

PATHOLOGY.M.STAGE Labels N
  M0 34
  M1 2
  MX 9
     
  Significant variables N = 0
Clinical variable #7: 'GENDER'

No variable related to 'GENDER'.

Table S12.  Basic characteristics of clinical feature: 'GENDER'

GENDER Labels N
  FEMALE 27
  MALE 39
     
  Significant variables N = 0
Clinical variable #8: 'KARNOFSKY.PERFORMANCE.SCORE'

No variable related to 'KARNOFSKY.PERFORMANCE.SCORE'.

Table S13.  Basic characteristics of clinical feature: 'KARNOFSKY.PERFORMANCE.SCORE'

KARNOFSKY.PERFORMANCE.SCORE Mean (SD) 89.09 (9.4)
  Score N
  70 1
  80 2
  90 5
  100 3
     
  Significant variables N = 0
Clinical variable #9: 'NUMBERPACKYEARSSMOKED'

No variable related to 'NUMBERPACKYEARSSMOKED'.

Table S14.  Basic characteristics of clinical feature: 'NUMBERPACKYEARSSMOKED'

NUMBERPACKYEARSSMOKED Mean (SD) 25.09 (22)
  Significant variables N = 0
Clinical variable #10: 'RACE'

No variable related to 'RACE'.

Table S15.  Basic characteristics of clinical feature: 'RACE'

RACE Labels N
  ASIAN 2
  BLACK OR AFRICAN AMERICAN 4
  WHITE 58
     
  Significant variables N = 0
Clinical variable #11: 'ETHNICITY'

One variable related to 'ETHNICITY'.

Table S16.  Basic characteristics of clinical feature: 'ETHNICITY'

ETHNICITY Labels N
  HISPANIC OR LATINO 4
  NOT HISPANIC OR LATINO 32
     
  Significant variables N = 1
  Higher in NOT HISPANIC OR LATINO 1
  Higher in HISPANIC OR LATINO 0
List of one variable associated with 'ETHNICITY'

Table S17.  Get Full Table List of one variable differentially expressed by 'ETHNICITY'

W(pos if higher in 'NOT HISPANIC OR LATINO') wilcoxontestP Q AUC
MUTATIONRATE_NONSYNONYMOUS c("105", "0.04149") c("105", "0.04149") 0.083 0.8203
Methods & Data
Input
  • Expresson data file = KICH-TP.patients.counts_and_rates.txt

  • Clinical data file = KICH-TP.merged_data.txt

  • Number of patients = 66

  • Number of variables = 2

  • Number of clinical features = 11

Correlation analysis

For continuous numerical clinical features, Spearman's rank correlation coefficients (Spearman 1904) and two-tailed P values were estimated using 'cor.test' function in R

ANOVA analysis

For multi-class clinical features (ordinal or nominal), one-way analysis of variance (Howell 2002) was applied to compare the log2-expression levels between different clinical classes using 'anova' function in R

Student's t-test analysis

For two-class clinical features, two-tailed Student's t test with unequal variance (Lehmann and Romano 2005) was applied to compare the log2-expression levels between the two clinical classes using 't.test' function in R

Q value calculation

For multiple hypothesis correction, Q value is the False Discovery Rate (FDR) analogue of the P value (Benjamini and Hochberg 1995), defined as the minimum FDR at which the test may be called significant. We used the 'Benjamini and Hochberg' method of 'p.adjust' function in R to convert P values into Q values.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References
[1] Spearman, C, The proof and measurement of association between two things, Amer. J. Psychol 15:72-101 (1904)
[2] Howell, D, Statistical Methods for Psychology. (5th ed.), Duxbury Press:324-5 (2002)
[3] Lehmann and Romano, Testing Statistical Hypotheses (3E ed.), New York: Springer. ISBN 0387988645 (2005)
[4] Benjamini and Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B 59:289-300 (1995)