This pipeline uses various statistical tests to identify mRNAs whose expression levels correlated to selected clinical features.
Testing the association between 17814 genes and 6 clinical features across 72 samples, statistically thresholded by Q value < 0.05, 4 clinical features related to at least one genes.
-
13 genes correlated to 'GENDER'.
-
DDX3Y , RPS4Y1 , CYORF15A , EIF1AY , RPS4Y2 , ...
-
64 genes correlated to 'DISTANT.METASTASIS'.
-
IGFBPL1 , CYORF15B , DDX3Y , EIF1AY , OAS1 , ...
-
41 genes correlated to 'LYMPH.NODE.METASTASIS'.
-
C10ORF97 , LCE2C , SLC35A5 , SRP9 , VPS24 , ...
-
66 genes correlated to 'NEOPLASM.DISEASESTAGE'.
-
TCF23 , PAX4 , OPRS1 , SNRP70 , JARID1C , ...
-
No genes correlated to 'Time to Death', and 'AGE'.
Complete statistical result table is provided in Supplement Table 1
Table 1. Get Full Table This table shows the clinical features, statistical methods used, and the number of genes that are significantly associated with each clinical feature at Q value < 0.05.
Clinical feature | Statistical test | Significant genes | Associated with | Associated with | ||
---|---|---|---|---|---|---|
Time to Death | Cox regression test | N=0 | ||||
AGE | Spearman correlation test | N=0 | ||||
GENDER | t test | N=13 | male | N=13 | female | N=0 |
DISTANT METASTASIS | t test | N=64 | m1 | N=23 | m0 | N=41 |
LYMPH NODE METASTASIS | ANOVA test | N=41 | ||||
NEOPLASM DISEASESTAGE | ANOVA test | N=66 |
Table S1. Basic characteristics of clinical feature: 'Time to Death'
Time to Death | Duration (Months) | 0.5-101.1 (median=32.6) |
censored | N = 58 | |
death | N = 13 | |
Significant markers | N = 0 |
Table S2. Basic characteristics of clinical feature: 'AGE'
AGE | Mean (SD) | 60.55 (12) |
Significant markers | N = 0 |
Table S3. Basic characteristics of clinical feature: 'GENDER'
GENDER | Labels | N |
FEMALE | 29 | |
MALE | 43 | |
Significant markers | N = 13 | |
Higher in MALE | 13 | |
Higher in FEMALE | 0 |
Table S4. Get Full Table List of top 10 genes differentially expressed by 'GENDER'
T(pos if higher in 'MALE') | ttestP | Q | AUC | |
---|---|---|---|---|
DDX3Y | 13.71 | 6.875e-21 | 1.22e-16 | 0.9575 |
RPS4Y1 | 14.08 | 1.088e-20 | 1.94e-16 | 0.9599 |
CYORF15A | 13.48 | 2.222e-20 | 3.96e-16 | 0.9655 |
EIF1AY | 13.4 | 2.577e-20 | 4.59e-16 | 0.9623 |
RPS4Y2 | 13.25 | 3.662e-20 | 6.52e-16 | 0.9527 |
JARID1D | 12.43 | 5.763e-19 | 1.03e-14 | 0.9583 |
ZFY | 11.55 | 9.008e-18 | 1.6e-13 | 0.9615 |
CYORF15B | 11 | 6.54e-17 | 1.16e-12 | 0.9415 |
UTY | 10.7 | 2.396e-16 | 4.27e-12 | 0.9479 |
USP9Y | 9.59 | 2.26e-14 | 4.02e-10 | 0.9318 |
Figure S1. Get High-res Image As an example, this figure shows the association of DDX3Y to 'GENDER'. P value = 6.87e-21 with T-test analysis.

Table S5. Basic characteristics of clinical feature: 'DISTANT.METASTASIS'
DISTANT.METASTASIS | Labels | N |
M0 | 67 | |
M1 | 5 | |
Significant markers | N = 64 | |
Higher in M1 | 23 | |
Higher in M0 | 41 |
Table S6. Get Full Table List of top 10 genes differentially expressed by 'DISTANT.METASTASIS'
T(pos if higher in 'M1') | ttestP | Q | AUC | |
---|---|---|---|---|
IGFBPL1 | -10.69 | 2.075e-15 | 3.7e-11 | 0.8955 |
CYORF15B | -10.34 | 1.448e-14 | 2.58e-10 | 0.9224 |
DDX3Y | -9.7 | 1.45e-14 | 2.58e-10 | 0.809 |
EIF1AY | -10.87 | 2.906e-14 | 5.18e-10 | 0.9493 |
OAS1 | 9.58 | 2.205e-12 | 3.93e-08 | 0.9433 |
RESP18 | 9.36 | 1.212e-11 | 2.16e-07 | 0.9463 |
PELI2 | -8.79 | 1.784e-11 | 3.18e-07 | 0.8657 |
UTY | -10.33 | 3.74e-11 | 6.66e-07 | 0.9821 |
JARID1D | -10.14 | 4.94e-11 | 8.8e-07 | 0.9045 |
HOXA7 | -9.6 | 6.012e-11 | 1.07e-06 | 0.8776 |
Figure S2. Get High-res Image As an example, this figure shows the association of IGFBPL1 to 'DISTANT.METASTASIS'. P value = 2.07e-15 with T-test analysis.

Table S7. Basic characteristics of clinical feature: 'LYMPH.NODE.METASTASIS'
LYMPH.NODE.METASTASIS | Labels | N |
N0 | 35 | |
N1 | 3 | |
NX | 34 | |
Significant markers | N = 41 |
Table S8. Get Full Table List of top 10 genes differentially expressed by 'LYMPH.NODE.METASTASIS'
ANOVA_P | Q | |
---|---|---|
C10ORF97 | 3.728e-09 | 6.64e-05 |
LCE2C | 6.296e-09 | 0.000112 |
SLC35A5 | 4.802e-08 | 0.000855 |
SRP9 | 5.772e-08 | 0.00103 |
VPS24 | 6.125e-08 | 0.00109 |
ST13 | 1.153e-07 | 0.00205 |
MAPK15 | 1.318e-07 | 0.00235 |
OR2F2 | 2.026e-07 | 0.00361 |
TMEM38B | 2.54e-07 | 0.00452 |
DEFB106B | 2.806e-07 | 0.005 |
Figure S3. Get High-res Image As an example, this figure shows the association of C10ORF97 to 'LYMPH.NODE.METASTASIS'. P value = 3.73e-09 with ANOVA analysis.

Table S9. Basic characteristics of clinical feature: 'NEOPLASM.DISEASESTAGE'
NEOPLASM.DISEASESTAGE | Labels | N |
STAGE I | 40 | |
STAGE II | 13 | |
STAGE III | 14 | |
STAGE IV | 5 | |
Significant markers | N = 66 |
Table S10. Get Full Table List of top 10 genes differentially expressed by 'NEOPLASM.DISEASESTAGE'
ANOVA_P | Q | |
---|---|---|
TCF23 | 6.108e-12 | 1.09e-07 |
PAX4 | 6.492e-11 | 1.16e-06 |
OPRS1 | 9.388e-10 | 1.67e-05 |
SNRP70 | 2.371e-09 | 4.22e-05 |
JARID1C | 1.003e-08 | 0.000179 |
MSTO1 | 1.063e-08 | 0.000189 |
SBNO2 | 1.133e-08 | 0.000202 |
ZNF646 | 2.465e-08 | 0.000439 |
GPR152 | 3.295e-08 | 0.000587 |
POM121L1 | 3.476e-08 | 0.000619 |
Figure S4. Get High-res Image As an example, this figure shows the association of TCF23 to 'NEOPLASM.DISEASESTAGE'. P value = 6.11e-12 with ANOVA analysis.

-
Expresson data file = KIRC-TP.medianexp.txt
-
Clinical data file = KIRC-TP.clin.merged.picked.txt
-
Number of patients = 72
-
Number of genes = 17814
-
Number of clinical features = 6
For survival clinical features, Wald's test in univariate Cox regression analysis with proportional hazards model (Andersen and Gill 1982) was used to estimate the P values using the 'coxph' function in R. Kaplan-Meier survival curves were plot using the four quartile subgroups of patients based on expression levels
For continuous numerical clinical features, Spearman's rank correlation coefficients (Spearman 1904) and two-tailed P values were estimated using 'cor.test' function in R
For two-class clinical features, two-tailed Student's t test with unequal variance (Lehmann and Romano 2005) was applied to compare the log2-expression levels between the two clinical classes using 't.test' function in R
For multi-class clinical features (ordinal or nominal), one-way analysis of variance (Howell 2002) was applied to compare the log2-expression levels between different clinical classes using 'anova' function in R
For multiple hypothesis correction, Q value is the False Discovery Rate (FDR) analogue of the P value (Benjamini and Hochberg 1995), defined as the minimum FDR at which the test may be called significant. We used the 'Benjamini and Hochberg' method of 'p.adjust' function in R to convert P values into Q values.