Correlation between mutation rate and clinical features
Pheochromocytoma and Paraganglioma (Primary solid tumor)
15 July 2014  |  analyses__2014_07_15
Maintainer Information
Citation Information
Maintained by Juok Cho (Broad Institute)
Cite as Broad Institute TCGA Genome Data Analysis Center (2014): Correlation between mutation rate and clinical features. Broad Institute of MIT and Harvard. doi:10.7908/C1JW8CN3
Overview
Introduction

This pipeline uses various statistical tests to identify selected clinical features related to mutation rate.

Summary

Testing the association between 2 variables and 4 clinical features across 61 samples, statistically thresholded by P value < 0.05 and Q value < 0.3, 2 clinical features related to at least one variables.

  • 1 variable correlated to 'AGE'.

    • MUTATIONRATE_NONSYNONYMOUS

  • 1 variable correlated to 'AGE_mutation.rate'.

    • MUTATIONRATE_NONSYNONYMOUS

  • No variables correlated to 'GENDER', and 'RACE'.

Results
Overview of the results

Complete statistical result table is provided in Supplement Table 1

Table 1.  Get Full Table This table shows the clinical features, statistical methods used, and the number of variables that are significantly associated with each clinical feature at P value < 0.05 and Q value < 0.3.

Clinical feature Statistical test Significant variables Associated with                 Associated with
AGE Spearman correlation test N=1 older N=1 younger N=0
AGE Linear Regression Analysis N=1        
GENDER Wilcoxon test   N=0        
RACE Kruskal-Wallis test   N=0        
Clinical variable #1: 'AGE'

One variable related to 'AGE'.

Table S1.  Basic characteristics of clinical feature: 'AGE'

AGE Mean (SD) 49.31 (14)
  Significant variables N = 1
  pos. correlated 1
  neg. correlated 0
List of one variable associated with 'AGE'

Table S2.  Get Full Table List of one variable significantly correlated to 'AGE' by Spearman correlation test

SpearmanCorr corrP Q
MUTATIONRATE_NONSYNONYMOUS 0.3252 0.01054 0.0211
Clinical variable #2: 'AGE'

One variable related to 'AGE'.

Table S3.  Basic characteristics of clinical feature: 'AGE'

AGE Mean (SD) 49.31 (14)
  Significant variables N = 1
List of one variable associated with 'AGE'

Table S4.  Get Full Table List of one variable significantly correlated to 'AGE' by Linear regression analysis [lm (mutation rate ~ age)]. Compared to a correlation analysis testing for interdependence of the variables, a regression model attempts to describe the dependence of a variable on one (or more) explanatory variables assuming that there is a one-way causal effect from the explanatory variable(s) to the response variable. If 'Residuals vs Fitted' plot (a standard residual plot) shows a random pattern indicating a good fit for a linear model, it explains linear regression relationship between Mutation rate and age factor. Adj.R-squared (= Explained variation / Total variation) indicates regression model's explanatory power.

Adj.R.squared F P Residual.std.err DF coef(intercept) coef.p(intercept)
MUTATIONRATE_NONSYNONYMOUS 0.0623 4.99 0.0293 1.67e-07 59 3.5e-09 ( 2.31e-07 ) 0.0293 ( 0.00547 )
Clinical variable #3: 'GENDER'

No variable related to 'GENDER'.

Table S5.  Basic characteristics of clinical feature: 'GENDER'

GENDER Labels N
  FEMALE 40
  MALE 21
     
  Significant variables N = 0
Clinical variable #4: 'RACE'

No variable related to 'RACE'.

Table S6.  Basic characteristics of clinical feature: 'RACE'

RACE Labels N
  AMERICAN INDIAN OR ALASKA NATIVE 1
  ASIAN 3
  BLACK OR AFRICAN AMERICAN 7
  WHITE 48
     
  Significant variables N = 0
Methods & Data
Input
  • Expresson data file = PCPG-TP.patients.counts_and_rates.txt

  • Clinical data file = PCPG-TP.merged_data.txt

  • Number of patients = 61

  • Number of variables = 2

  • Number of clinical features = 4

Correlation analysis

For continuous numerical clinical features, Spearman's rank correlation coefficients (Spearman 1904) and two-tailed P values were estimated using 'cor.test' function in R

Student's t-test analysis

For two-class clinical features, two-tailed Student's t test with unequal variance (Lehmann and Romano 2005) was applied to compare the log2-expression levels between the two clinical classes using 't.test' function in R

ANOVA analysis

For multi-class clinical features (ordinal or nominal), one-way analysis of variance (Howell 2002) was applied to compare the log2-expression levels between different clinical classes using 'anova' function in R

Q value calculation

For multiple hypothesis correction, Q value is the False Discovery Rate (FDR) analogue of the P value (Benjamini and Hochberg 1995), defined as the minimum FDR at which the test may be called significant. We used the 'Benjamini and Hochberg' method of 'p.adjust' function in R to convert P values into Q values.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References
[1] Spearman, C, The proof and measurement of association between two things, Amer. J. Psychol 15:72-101 (1904)
[2] Lehmann and Romano, Testing Statistical Hypotheses (3E ed.), New York: Springer. ISBN 0387988645 (2005)
[3] Howell, D, Statistical Methods for Psychology. (5th ed.), Duxbury Press:324-5 (2002)
[4] Benjamini and Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B 59:289-300 (1995)