Correlation between mutation rate and clinical features

Kidney Chromophobe (Primary solid tumor)

15 July 2014 | analyses__2014_07_15

Maintainer Information

Citation Information

Maintained by Juok Cho (Broad Institute)

Cite as Broad Institute TCGA Genome Data Analysis Center (2014): Correlation between mutation rate and clinical features. Broad Institute of MIT and Harvard. doi:10.7908/C1JW8CMN

Overview

Introduction

This pipeline uses various statistical tests to identify selected clinical features related to mutation rate.

Summary

Testing the association between 2 variables and 11 clinical features across 66 samples, statistically thresholded by P value < 0.05 and Q value < 0.3, 6 clinical features related to at least one variables.

2 variables correlated to 'AGE'.

MUTATIONRATE_NONSYNONYMOUS , MUTATIONRATE_SILENT

1 variable correlated to 'AGE_mutation.rate'.

MUTATIONRATE_NONSYNONYMOUS

2 variables correlated to 'NEOPLASM.DISEASESTAGE'.

MUTATIONRATE_NONSYNONYMOUS , MUTATIONRATE_SILENT

2 variables correlated to 'PATHOLOGY.T.STAGE'.

MUTATIONRATE_SILENT , MUTATIONRATE_NONSYNONYMOUS

2 variables correlated to 'PATHOLOGY.N.STAGE'.

MUTATIONRATE_NONSYNONYMOUS , MUTATIONRATE_SILENT

1 variable correlated to 'ETHNICITY'.

MUTATIONRATE_NONSYNONYMOUS

No variables correlated to 'PATHOLOGY.M.STAGE', 'GENDER', 'KARNOFSKY.PERFORMANCE.SCORE', 'NUMBERPACKYEARSSMOKED', and 'RACE'.

Results

Overview of the results

Complete statistical result table is provided in Supplement Table 1

Table 1. Get Full Table This table shows the clinical features, statistical methods used, and the number of variables that are significantly associated with each clinical feature at P value < 0.05 and Q value < 0.3.

Clinical feature	Statistical test	Significant variables	Associated with		Associated with
AGE	Spearman correlation test	N=2	older	N=2	younger	N=0
AGE	Linear Regression Analysis	N=1
NEOPLASM DISEASESTAGE	Kruskal-Wallis test	N=2
PATHOLOGY T STAGE	Spearman correlation test	N=2	higher stage	N=2	lower stage	N=0
PATHOLOGY N STAGE	Spearman correlation test	N=2	higher stage	N=2	lower stage	N=0
PATHOLOGY M STAGE	Kruskal-Wallis test	N=0
GENDER	Wilcoxon test	N=0
KARNOFSKY PERFORMANCE SCORE	Spearman correlation test	N=0
NUMBERPACKYEARSSMOKED	Spearman correlation test	N=0
RACE	Kruskal-Wallis test	N=0
ETHNICITY	Wilcoxon test	N=1	not hispanic or latino	N=1	hispanic or latino	N=0

Clinical variable #1: 'AGE'

2 variables related to 'AGE'.

Table S1. Basic characteristics of clinical feature: 'AGE'


AGE	Mean (SD)	51.52 (14)
	Significant variables	N = 2
	pos. correlated	2
	neg. correlated	0

List of 2 variables associated with 'AGE'

Table S2. Get Full Table List of 2 variables significantly correlated to 'AGE' by Spearman correlation test

	SpearmanCorr	corrP	Q
MUTATIONRATE_NONSYNONYMOUS	0.4443	0.0001863	0.000373
MUTATIONRATE_SILENT	0.3298	0.006846	0.00685

Clinical variable #2: 'AGE'

One variable related to 'AGE'.

Table S3. Basic characteristics of clinical feature: 'AGE'


AGE	Mean (SD)	51.52 (14)
	Significant variables	N = 1

List of one variable associated with 'AGE'

Table S4. Get Full Table List of one variable significantly correlated to 'AGE' by Linear regression analysis [lm (mutation rate ~ age)]. Compared to a correlation analysis testing for interdependence of the variables, a regression model attempts to describe the dependence of a variable on one (or more) explanatory variables assuming that there is a one-way causal effect from the explanatory variable(s) to the response variable. If 'Residuals vs Fitted' plot (a standard residual plot) shows a random pattern indicating a good fit for a linear model, it explains linear regression relationship between Mutation rate and age factor. Adj.R-squared (= Explained variation / Total variation) indicates regression model's explanatory power.

	Adj.R.squared	F	P	Residual.std.err	DF	coef(intercept)	coef.p(intercept)
MUTATIONRATE_NONSYNONYMOUS	0.0443	4.01	0.0494	2.47e-06	64	4.3e-08 ( -4.64e-07 )	0.0494 ( 0.687 )

Clinical variable #3: 'NEOPLASM.DISEASESTAGE'

2 variables related to 'NEOPLASM.DISEASESTAGE'.

Table S5. Basic characteristics of clinical feature: 'NEOPLASM.DISEASESTAGE'


NEOPLASM.DISEASESTAGE	Labels	N
	STAGE I	21
	STAGE II	25
	STAGE III	14
	STAGE IV	6

	Significant variables	N = 2

List of 2 variables associated with 'NEOPLASM.DISEASESTAGE'

Table S6. Get Full Table List of 2 variables differentially expressed by 'NEOPLASM.DISEASESTAGE'

	ANOVA_P	Q
MUTATIONRATE_NONSYNONYMOUS	0.02758	0.035
MUTATIONRATE_SILENT	0.01748	0.035

Clinical variable #4: 'PATHOLOGY.T.STAGE'

2 variables related to 'PATHOLOGY.T.STAGE'.

Table S7. Basic characteristics of clinical feature: 'PATHOLOGY.T.STAGE'


PATHOLOGY.T.STAGE	Mean (SD)	2.02 (0.85)
		N
	1	21
	2	25
	3	18
	4	2

	Significant variables	N = 2
	pos. correlated	2
	neg. correlated	0

List of 2 variables associated with 'PATHOLOGY.T.STAGE'

Table S8. Get Full Table List of 2 variables significantly correlated to 'PATHOLOGY.T.STAGE' by Spearman correlation test

	SpearmanCorr	corrP	Q
MUTATIONRATE_SILENT	0.3581	0.003157	0.00631
MUTATIONRATE_NONSYNONYMOUS	0.2891	0.01856	0.0186

Clinical variable #5: 'PATHOLOGY.N.STAGE'

2 variables related to 'PATHOLOGY.N.STAGE'.

Table S9. Basic characteristics of clinical feature: 'PATHOLOGY.N.STAGE'


PATHOLOGY.N.STAGE	Mean (SD)	0.16 (0.47)
		N
	0	40
	1	3
	2	2

	Significant variables	N = 2
	pos. correlated	2
	neg. correlated	0

List of 2 variables associated with 'PATHOLOGY.N.STAGE'

Table S10. Get Full Table List of 2 variables significantly correlated to 'PATHOLOGY.N.STAGE' by Spearman correlation test

	SpearmanCorr	corrP	Q
MUTATIONRATE_NONSYNONYMOUS	0.4184	0.004231	0.00846
MUTATIONRATE_SILENT	0.354	0.01703	0.017

Clinical variable #6: 'PATHOLOGY.M.STAGE'

No variable related to 'PATHOLOGY.M.STAGE'.

Table S11. Basic characteristics of clinical feature: 'PATHOLOGY.M.STAGE'


PATHOLOGY.M.STAGE	Labels	N
	M0	34
	M1	2
	MX	9

	Significant variables	N = 0

Clinical variable #7: 'GENDER'

No variable related to 'GENDER'.

Table S12. Basic characteristics of clinical feature: 'GENDER'


GENDER	Labels	N
	FEMALE	27
	MALE	39

	Significant variables	N = 0

Clinical variable #8: 'KARNOFSKY.PERFORMANCE.SCORE'

No variable related to 'KARNOFSKY.PERFORMANCE.SCORE'.

Table S13. Basic characteristics of clinical feature: 'KARNOFSKY.PERFORMANCE.SCORE'


KARNOFSKY.PERFORMANCE.SCORE	Mean (SD)	89.09 (9.4)
	Score	N
	70	1
	80	2
	90	5
	100	3

	Significant variables	N = 0

Clinical variable #9: 'NUMBERPACKYEARSSMOKED'

No variable related to 'NUMBERPACKYEARSSMOKED'.

Table S14. Basic characteristics of clinical feature: 'NUMBERPACKYEARSSMOKED'


NUMBERPACKYEARSSMOKED	Mean (SD)	25.09 (22)
	Significant variables	N = 0

Clinical variable #10: 'RACE'

No variable related to 'RACE'.

Table S15. Basic characteristics of clinical feature: 'RACE'


RACE	Labels	N
	ASIAN	2
	BLACK OR AFRICAN AMERICAN	4
	WHITE	58

	Significant variables	N = 0

Clinical variable #11: 'ETHNICITY'

One variable related to 'ETHNICITY'.

Table S16. Basic characteristics of clinical feature: 'ETHNICITY'


ETHNICITY	Labels	N
	HISPANIC OR LATINO	4
	NOT HISPANIC OR LATINO	32

	Significant variables	N = 1
	Higher in NOT HISPANIC OR LATINO	1
	Higher in HISPANIC OR LATINO	0

List of one variable associated with 'ETHNICITY'

Table S17. Get Full Table List of one variable differentially expressed by 'ETHNICITY'

	W(pos if higher in 'NOT HISPANIC OR LATINO')	wilcoxontestP	Q	AUC
MUTATIONRATE_NONSYNONYMOUS	c("105", "0.04149")	c("105", "0.04149")	0.083	0.8203

Methods & Data

Input

Expresson data file = KICH-TP.patients.counts_and_rates.txt
Clinical data file = KICH-TP.merged_data.txt
Number of patients = 66
Number of variables = 2
Number of clinical features = 11

Correlation analysis

For continuous numerical clinical features, Spearman's rank correlation coefficients (Spearman 1904) and two-tailed P values were estimated using 'cor.test' function in R

ANOVA analysis

For multi-class clinical features (ordinal or nominal), one-way analysis of variance (Howell 2002) was applied to compare the log2-expression levels between different clinical classes using 'anova' function in R

Student's t-test analysis

For two-class clinical features, two-tailed Student's t test with unequal variance (Lehmann and Romano 2005) was applied to compare the log2-expression levels between the two clinical classes using 't.test' function in R

Q value calculation

For multiple hypothesis correction, Q value is the False Discovery Rate (FDR) analogue of the P value (Benjamini and Hochberg 1995), defined as the minimum FDR at which the test may be called significant. We used the 'Benjamini and Hochberg' method of 'p.adjust' function in R to convert P values into Q values.

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References

[1] Spearman, C, The proof and measurement of association between two things, Amer. J. Psychol 15:72-101 (1904)

[2] Howell, D, Statistical Methods for Psychology. (5th ed.), Duxbury Press:324-5 (2002)

[3] Lehmann and Romano, Testing Statistical Hypotheses (3E ed.), New York: Springer. ISBN 0387988645 (2005)

[4] Benjamini and Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B 59:289-300 (1995)

Made with Nozzle