This pipeline uses various statistical tests to identify mRNAs whose log2 expression levels correlated to selected clinical features.
Testing the association between 18555 genes and 7 clinical features across 488 samples, statistically thresholded by P value < 0.05 and Q value < 0.3, 5 clinical features related to at least one genes.
-
21 genes correlated to 'Time to Death'.
-
CD3EAP|10849 , SRD5A1|6715 , SCGB2A1|4246 , JMJD7-PLA2G4B|8681 , DKC1|1736 , ...
-
664 genes correlated to 'AGE'.
-
DIO2|1734 , FAM107A|11170 , PTCH1|5727 , S100A1|6271 , NR2F6|2063 , ...
-
4726 genes correlated to 'HISTOLOGICAL.TYPE'.
-
KIAA1324|57535 , L1CAM|3897 , FOXA2|3170 , PPAP2C|8612 , HIF3A|64344 , ...
-
100 genes correlated to 'RADIATIONS.RADIATION.REGIMENINDICATION'.
-
RPL23AP82|284942 , RPL23AP7|118433 , LOC341056|341056 , UBE2MP1|606551 , EDARADD|128178 , ...
-
69 genes correlated to 'RACE'.
-
SORD|6652 , PPIL3|53938 , ACTB|60 , LOC90784|90784 , NOTCH2NL|388677 , ...
-
No genes correlated to 'COMPLETENESS.OF.RESECTION', and 'ETHNICITY'.
Complete statistical result table is provided in Supplement Table 1
Clinical feature | Statistical test | Significant genes | Associated with | Associated with | ||
---|---|---|---|---|---|---|
Time to Death | Cox regression test | N=21 | shorter survival | N=16 | longer survival | N=5 |
AGE | Spearman correlation test | N=664 | older | N=401 | younger | N=263 |
HISTOLOGICAL TYPE | Kruskal-Wallis test | N=4726 | ||||
RADIATIONS RADIATION REGIMENINDICATION | Wilcoxon test | N=100 | yes | N=100 | no | N=0 |
COMPLETENESS OF RESECTION | Kruskal-Wallis test | N=0 | ||||
RACE | Kruskal-Wallis test | N=69 | ||||
ETHNICITY | Wilcoxon test | N=0 |
Time to Death | Duration (Months) | 0-191.8 (median=22.7) |
censored | N = 426 | |
death | N = 60 | |
Significant markers | N = 21 | |
associated with shorter survival | 16 | |
associated with longer survival | 5 |
HazardRatio | Wald_P | Q | C_index | |
---|---|---|---|---|
CD3EAP|10849 | 2.8 | 4.005e-09 | 7.4e-05 | 0.678 |
SRD5A1|6715 | 2.6 | 5.841e-08 | 0.0011 | 0.669 |
SCGB2A1|4246 | 0.85 | 1.986e-06 | 0.037 | 0.304 |
JMJD7-PLA2G4B|8681 | 0.5 | 3.887e-06 | 0.072 | 0.33 |
DKC1|1736 | 2.6 | 4.175e-06 | 0.077 | 0.691 |
GOLGA7|51125 | 2.4 | 4.23e-06 | 0.078 | 0.638 |
MGAT4A|11320 | 1.61 | 4.696e-06 | 0.087 | 0.672 |
KIAA1324|57535 | 0.86 | 4.891e-06 | 0.091 | 0.337 |
CPS1|1373 | 1.26 | 5.822e-06 | 0.11 | 0.641 |
MRPL15|29088 | 2.4 | 7.032e-06 | 0.13 | 0.642 |
AGE | Mean (SD) | 63.7 (11) |
Significant markers | N = 664 | |
pos. correlated | 401 | |
neg. correlated | 263 |
SpearmanCorr | corrP | Q | |
---|---|---|---|
DIO2|1734 | -0.3771 | 6.657e-18 | 1.24e-13 |
FAM107A|11170 | 0.3409 | 1.024e-14 | 1.9e-10 |
PTCH1|5727 | -0.3351 | 3.064e-14 | 5.68e-10 |
S100A1|6271 | 0.3345 | 3.383e-14 | 6.28e-10 |
NR2F6|2063 | 0.3215 | 3.585e-13 | 6.65e-09 |
HIF3A|64344 | 0.3237 | 3.823e-13 | 7.09e-09 |
SPTBN4|57731 | 0.318 | 6.936e-13 | 1.29e-08 |
DUSP9|1852 | 0.3288 | 8.862e-13 | 1.64e-08 |
MGAT4A|11320 | 0.3133 | 1.499e-12 | 2.78e-08 |
HRASLS|57110 | 0.3239 | 1.504e-12 | 2.79e-08 |
HISTOLOGICAL.TYPE | Labels | N |
ENDOMETRIOID ENDOMETRIAL ADENOCARCINOMA | 370 | |
MIXED SEROUS AND ENDOMETRIOID | 19 | |
SEROUS ENDOMETRIAL ADENOCARCINOMA | 99 | |
Significant markers | N = 4726 |
ANOVA_P | Q | |
---|---|---|
KIAA1324|57535 | 2.103e-38 | 3.9e-34 |
L1CAM|3897 | 4.376e-38 | 8.12e-34 |
FOXA2|3170 | 2.664e-36 | 4.94e-32 |
PPAP2C|8612 | 5.16e-36 | 9.57e-32 |
HIF3A|64344 | 5.193e-36 | 9.63e-32 |
SLC6A12|6539 | 9.731e-35 | 1.81e-30 |
IL20RA|53832 | 9.972e-35 | 1.85e-30 |
TFF3|7033 | 3.197e-34 | 5.93e-30 |
CDKN1A|1026 | 9.645e-34 | 1.79e-29 |
SPDEF|25803 | 1.681e-33 | 3.12e-29 |
100 genes related to 'RADIATIONS.RADIATION.REGIMENINDICATION'.
RADIATIONS.RADIATION.REGIMENINDICATION | Labels | N |
NO | 137 | |
YES | 351 | |
Significant markers | N = 100 | |
Higher in YES | 100 | |
Higher in NO | 0 |
W(pos if higher in 'YES') | wilcoxontestP | Q | AUC | |
---|---|---|---|---|
RPL23AP82|284942 | 14317 | 3.706e-12 | 6.88e-08 | 0.7023 |
RPL23AP7|118433 | 14517 | 1.01e-11 | 1.87e-07 | 0.6981 |
LOC341056|341056 | 15113 | 1.779e-10 | 3.3e-06 | 0.6857 |
UBE2MP1|606551 | 15723.5 | 2.795e-09 | 5.19e-05 | 0.673 |
EDARADD|128178 | 15777 | 3.527e-09 | 6.54e-05 | 0.6719 |
POTEE|445582 | 15556 | 5.884e-09 | 0.000109 | 0.6699 |
UBE2NL|389898 | 15526.5 | 7.531e-09 | 0.00014 | 0.6691 |
TPI1P3|728402 | 14842 | 8.41e-09 | 0.000156 | 0.6698 |
LOC100130932|100130932 | 16003 | 9.273e-09 | 0.000172 | 0.6672 |
PGAM4|441531 | 16066 | 1.209e-08 | 0.000224 | 0.6659 |
COMPLETENESS.OF.RESECTION | Labels | N |
R0 | 337 | |
R1 | 22 | |
R2 | 17 | |
RX | 29 | |
Significant markers | N = 0 |
RACE | Labels | N |
AMERICAN INDIAN OR ALASKA NATIVE | 4 | |
ASIAN | 19 | |
BLACK OR AFRICAN AMERICAN | 74 | |
NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER | 9 | |
WHITE | 357 | |
Significant markers | N = 69 |
ANOVA_P | Q | |
---|---|---|
SORD|6652 | 8.397e-12 | 1.56e-07 |
PPIL3|53938 | 1.311e-11 | 2.43e-07 |
ACTB|60 | 2.691e-10 | 4.99e-06 |
LOC90784|90784 | 3.191e-10 | 5.92e-06 |
NOTCH2NL|388677 | 5.702e-10 | 1.06e-05 |
UTS2|10911 | 3.443e-09 | 6.39e-05 |
LRRC37A2|474170 | 4.905e-09 | 9.1e-05 |
DHRS4L1|728635 | 1.347e-08 | 0.00025 |
APH1A|51107 | 1.385e-08 | 0.000257 |
CNN2|1265 | 1.991e-08 | 0.000369 |
-
Expresson data file = UCEC-TP.uncv2.mRNAseq_RSEM_normalized_log2.txt
-
Clinical data file = UCEC-TP.merged_data.txt
-
Number of patients = 488
-
Number of genes = 18555
-
Number of clinical features = 7
For survival clinical features, Wald's test in univariate Cox regression analysis with proportional hazards model (Andersen and Gill 1982) was used to estimate the P values using the 'coxph' function in R. Kaplan-Meier survival curves were plot using the four quartile subgroups of patients based on expression levels
For continuous numerical clinical features, Spearman's rank correlation coefficients (Spearman 1904) and two-tailed P values were estimated using 'cor.test' function in R
For multi-class clinical features (ordinal or nominal), one-way analysis of variance (Howell 2002) was applied to compare the log2-expression levels between different clinical classes using 'anova' function in R
For two-class clinical features, two-tailed Student's t test with unequal variance (Lehmann and Romano 2005) was applied to compare the log2-expression levels between the two clinical classes using 't.test' function in R
For multiple hypothesis correction, Q value is the False Discovery Rate (FDR) analogue of the P value (Benjamini and Hochberg 1995), defined as the minimum FDR at which the test may be called significant. We used the 'Benjamini and Hochberg' method of 'p.adjust' function in R to convert P values into Q values.
In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.