Acute Myeloid Leukemia: Correlation between molecular cancer subtypes and selected clinical features
Maintained by TCGA GDAC Team (Broad Institute/Dana-Farber Cancer Institute/Harvard Medical School)
Overview
Introduction

This pipeline computes the correlation between cancer subtypes identified by different molecular patterns and selected clinical features.

Summary

Testing the association between subtypes identified by 5 different clustering approaches and 3 clinical features across 197 patients, 5 significant findings detected with P value < 0.05.

  • 4 subtypes identified in current cancer cohort by 'METHLYATION CNMF'. These subtypes correlate to 'Time to Death' and 'AGE'.

  • CNMF clustering analysis on sequencing-based mRNA expression data identified 3 subtypes that do not correlate to any clinical features.

  • Consensus hierarchical clustering analysis on sequencing-based mRNA expression data identified 2 subtypes that correlate to 'AGE'.

  • CNMF clustering analysis on sequencing-based miR expression data identified 3 subtypes that correlate to 'Time to Death'.

  • Consensus hierarchical clustering analysis on sequencing-based miR expression data identified 3 subtypes that correlate to 'Time to Death'.

Results
Overview of the results

Table 1.  Get Full Table Overview of the association between subtypes identified by 5 different clustering approaches and 3 clinical features. Shown in the table are P values from statistical tests. Thresholded by P value < 0.05, 5 significant findings detected.

Clinical
Features
Time
to
Death
AGE GENDER
Statistical Tests logrank test ANOVA Fisher's exact test
METHLYATION CNMF 2.49e-07 4.85e-08 0.41
RNAseq CNMF subtypes 0.125 0.39 0.312
RNAseq cHierClus subtypes 0.683 0.0292 1
MIRseq CNMF subtypes 0.00738 0.221 0.797
MIRseq cHierClus subtypes 0.00985 0.077 0.767
Clustering Approach #1: 'METHLYATION CNMF'

Table S1.  Get Full Table Description of clustering approach #1: 'METHLYATION CNMF'

Cluster Labels 1 2 3 4
Number of samples 72 45 65 12
'METHLYATION CNMF' versus 'Time to Death'

P value = 2.49e-07 (logrank test)

Table S2.  Clustering Approach #1: 'METHLYATION CNMF' versus Clinical Feature #1: 'Time to Death'

nPatients nDeath Duration Range (Median), Month
ALL 169 106 0.9 - 94.1 (12.0)
subtype1 63 45 1.0 - 62.0 (8.1)
subtype2 42 13 0.9 - 94.1 (21.6)
subtype3 54 42 0.9 - 69.0 (12.0)
subtype4 10 6 9.0 - 73.0 (23.0)

Figure S1.  Get High-res Image Clustering Approach #1: 'METHLYATION CNMF' versus Clinical Feature #1: 'Time to Death'

'METHLYATION CNMF' versus 'AGE'

P value = 4.85e-08 (ANOVA)

Table S3.  Clustering Approach #1: 'METHLYATION CNMF' versus Clinical Feature #2: 'AGE'

nPatients Mean (Std.Dev)
ALL 194 55.1 (16.0)
subtype1 72 55.5 (15.1)
subtype2 45 45.6 (15.5)
subtype3 65 62.8 (12.8)
subtype4 12 46.9 (17.7)

Figure S2.  Get High-res Image Clustering Approach #1: 'METHLYATION CNMF' versus Clinical Feature #2: 'AGE'

'METHLYATION CNMF' versus 'GENDER'

P value = 0.41 (Fisher's exact test)

Table S4.  Clustering Approach #1: 'METHLYATION CNMF' versus Clinical Feature #3: 'GENDER'

nPatients FEMALE MALE
ALL 89 105
subtype1 35 37
subtype2 24 21
subtype3 26 39
subtype4 4 8

Figure S3.  Get High-res Image Clustering Approach #1: 'METHLYATION CNMF' versus Clinical Feature #3: 'GENDER'

Clustering Approach #2: 'RNAseq CNMF subtypes'

Table S5.  Get Full Table Description of clustering approach #2: 'RNAseq CNMF subtypes'

Cluster Labels 1 2 3
Number of samples 74 57 48
'RNAseq CNMF subtypes' versus 'Time to Death'

P value = 0.125 (logrank test)

Table S6.  Clustering Approach #2: 'RNAseq CNMF subtypes' versus Clinical Feature #1: 'Time to Death'

nPatients nDeath Duration Range (Median), Month
ALL 157 97 0.9 - 94.1 (12.0)
subtype1 67 39 1.0 - 94.1 (16.1)
subtype2 50 32 0.9 - 75.1 (11.0)
subtype3 40 26 0.9 - 62.0 (9.5)

Figure S4.  Get High-res Image Clustering Approach #2: 'RNAseq CNMF subtypes' versus Clinical Feature #1: 'Time to Death'

'RNAseq CNMF subtypes' versus 'AGE'

P value = 0.39 (ANOVA)

Table S7.  Clustering Approach #2: 'RNAseq CNMF subtypes' versus Clinical Feature #2: 'AGE'

nPatients Mean (Std.Dev)
ALL 179 55.0 (15.9)
subtype1 74 53.8 (17.1)
subtype2 57 57.4 (13.6)
subtype3 48 54.0 (16.7)

Figure S5.  Get High-res Image Clustering Approach #2: 'RNAseq CNMF subtypes' versus Clinical Feature #2: 'AGE'

'RNAseq CNMF subtypes' versus 'GENDER'

P value = 0.312 (Fisher's exact test)

Table S8.  Clustering Approach #2: 'RNAseq CNMF subtypes' versus Clinical Feature #3: 'GENDER'

nPatients FEMALE MALE
ALL 84 95
subtype1 33 41
subtype2 24 33
subtype3 27 21

Figure S6.  Get High-res Image Clustering Approach #2: 'RNAseq CNMF subtypes' versus Clinical Feature #3: 'GENDER'

Clustering Approach #3: 'RNAseq cHierClus subtypes'

Table S9.  Get Full Table Description of clustering approach #3: 'RNAseq cHierClus subtypes'

Cluster Labels 1 2
Number of samples 61 118
'RNAseq cHierClus subtypes' versus 'Time to Death'

P value = 0.683 (logrank test)

Table S10.  Clustering Approach #3: 'RNAseq cHierClus subtypes' versus Clinical Feature #1: 'Time to Death'

nPatients nDeath Duration Range (Median), Month
ALL 157 97 0.9 - 94.1 (12.0)
subtype1 52 33 0.9 - 75.1 (11.5)
subtype2 105 64 0.9 - 94.1 (12.9)

Figure S7.  Get High-res Image Clustering Approach #3: 'RNAseq cHierClus subtypes' versus Clinical Feature #1: 'Time to Death'

'RNAseq cHierClus subtypes' versus 'AGE'

P value = 0.0292 (t-test)

Table S11.  Clustering Approach #3: 'RNAseq cHierClus subtypes' versus Clinical Feature #2: 'AGE'

nPatients Mean (Std.Dev)
ALL 179 55.0 (15.9)
subtype1 61 58.3 (12.8)
subtype2 118 53.3 (17.1)

Figure S8.  Get High-res Image Clustering Approach #3: 'RNAseq cHierClus subtypes' versus Clinical Feature #2: 'AGE'

'RNAseq cHierClus subtypes' versus 'GENDER'

P value = 1 (Fisher's exact test)

Table S12.  Clustering Approach #3: 'RNAseq cHierClus subtypes' versus Clinical Feature #3: 'GENDER'

nPatients FEMALE MALE
ALL 84 95
subtype1 29 32
subtype2 55 63

Figure S9.  Get High-res Image Clustering Approach #3: 'RNAseq cHierClus subtypes' versus Clinical Feature #3: 'GENDER'

Clustering Approach #4: 'MIRseq CNMF subtypes'

Table S13.  Get Full Table Description of clustering approach #4: 'MIRseq CNMF subtypes'

Cluster Labels 1 2 3
Number of samples 86 40 61
'MIRseq CNMF subtypes' versus 'Time to Death'

P value = 0.00738 (logrank test)

Table S14.  Clustering Approach #4: 'MIRseq CNMF subtypes' versus Clinical Feature #1: 'Time to Death'

nPatients nDeath Duration Range (Median), Month
ALL 163 101 0.9 - 94.1 (12.0)
subtype1 75 54 0.9 - 73.0 (10.0)
subtype2 36 18 0.9 - 62.0 (14.5)
subtype3 52 29 1.0 - 94.1 (15.0)

Figure S10.  Get High-res Image Clustering Approach #4: 'MIRseq CNMF subtypes' versus Clinical Feature #1: 'Time to Death'

'MIRseq CNMF subtypes' versus 'AGE'

P value = 0.221 (ANOVA)

Table S15.  Clustering Approach #4: 'MIRseq CNMF subtypes' versus Clinical Feature #2: 'AGE'

nPatients Mean (Std.Dev)
ALL 187 55.1 (16.0)
subtype1 86 56.1 (14.5)
subtype2 40 57.2 (14.5)
subtype3 61 52.2 (18.7)

Figure S11.  Get High-res Image Clustering Approach #4: 'MIRseq CNMF subtypes' versus Clinical Feature #2: 'AGE'

'MIRseq CNMF subtypes' versus 'GENDER'

P value = 0.797 (Fisher's exact test)

Table S16.  Clustering Approach #4: 'MIRseq CNMF subtypes' versus Clinical Feature #3: 'GENDER'

nPatients FEMALE MALE
ALL 86 101
subtype1 39 47
subtype2 17 23
subtype3 30 31

Figure S12.  Get High-res Image Clustering Approach #4: 'MIRseq CNMF subtypes' versus Clinical Feature #3: 'GENDER'

Clustering Approach #5: 'MIRseq cHierClus subtypes'

Table S17.  Get Full Table Description of clustering approach #5: 'MIRseq cHierClus subtypes'

Cluster Labels 1 2 3
Number of samples 38 82 67
'MIRseq cHierClus subtypes' versus 'Time to Death'

P value = 0.00985 (logrank test)

Table S18.  Clustering Approach #5: 'MIRseq cHierClus subtypes' versus Clinical Feature #1: 'Time to Death'

nPatients nDeath Duration Range (Median), Month
ALL 163 101 0.9 - 94.1 (12.0)
subtype1 34 16 0.9 - 62.0 (14.5)
subtype2 71 50 0.9 - 69.0 (10.0)
subtype3 58 35 1.0 - 94.1 (15.0)

Figure S13.  Get High-res Image Clustering Approach #5: 'MIRseq cHierClus subtypes' versus Clinical Feature #1: 'Time to Death'

'MIRseq cHierClus subtypes' versus 'AGE'

P value = 0.077 (ANOVA)

Table S19.  Clustering Approach #5: 'MIRseq cHierClus subtypes' versus Clinical Feature #2: 'AGE'

nPatients Mean (Std.Dev)
ALL 187 55.1 (16.0)
subtype1 38 58.2 (14.2)
subtype2 82 56.4 (13.6)
subtype3 67 51.6 (19.1)

Figure S14.  Get High-res Image Clustering Approach #5: 'MIRseq cHierClus subtypes' versus Clinical Feature #2: 'AGE'

'MIRseq cHierClus subtypes' versus 'GENDER'

P value = 0.767 (Fisher's exact test)

Table S20.  Clustering Approach #5: 'MIRseq cHierClus subtypes' versus Clinical Feature #3: 'GENDER'

nPatients FEMALE MALE
ALL 86 101
subtype1 16 22
subtype2 40 42
subtype3 30 37

Figure S15.  Get High-res Image Clustering Approach #5: 'MIRseq cHierClus subtypes' versus Clinical Feature #3: 'GENDER'

Methods & Data
Input
  • Cluster data file = LAML.mergedcluster.txt

  • Clinical data file = LAML.clin.merged.picked.txt

  • Number of patients = 197

  • Number of clustering approaches = 5

  • Number of selected clinical features = 3

  • Exclude small clusters that include fewer than K patients, K = 3

Clustering approaches
CNMF clustering

consensus non-negative matrix factorization clustering approach (Brunet et al. 2004)

Consensus hierarchical clustering

Resampling-based clustering method (Monti et al. 2003)

Survival analysis

For survival clinical features, the Kaplan-Meier survival curves of tumors with and without gene mutations were plotted and the statistical significance P values were estimated by logrank test (Bland and Altman 2004) using the 'survdiff' function in R

ANOVA analysis

For continuous numerical clinical features, one-way analysis of variance (Howell 2002) was applied to compare the clinical values between tumor subtypes using 'anova' function in R

Fisher's exact test

For binary clinical features, two-tailed Fisher's exact tests (Fisher 1922) were used to estimate the P values using the 'fisher.test' function in R

Student's t-test analysis

For continuous numerical clinical features, two-tailed Student's t test with unequal variance (Lehmann and Romano 2005) was applied to compare the clinical values between two tumor subtypes using 't.test' function in R

Download Results

This is an experimental feature. The full results of the analysis summarized in this report can be downloaded from the TCGA Data Coordination Center.

References
[1] Brunet et al., Metagenes and molecular pattern discovery using matrix factorization, PNAS 101(12):4164-9 (2004)
[3] Bland and Altman, Statistics notes: The logrank test, BMJ 328(7447):1073 (2004)
[4] Howell, D, Statistical Methods for Psychology. (5th ed.), Duxbury Press:324-5 (2002)
[5] Fisher, R.A., On the interpretation of chi-square from contingency tables, and the calculation of P, Journal of the Royal Statistical Society 85(1):87-94 (1922)
[6] Lehmann and Romano, Testing Statistical Hypotheses (3E ed.), New York: Springer. ISBN 0387988645 (2005)