This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.101, -0.0689, -0.0456, -0.0258, -0.0076, 0.0107, 0.03, 0.0526, 0.0834, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 248 | 545 | 247 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
CELA1|1990 | 0.7329 | 5.46007683510652e-13 | 5.06231023766901e-09 |
MAGEE2|139599 | 0.4851 | 6.29405611629608e-07 | 0.000583553412822391 |
EMX1|2016 | 0.4636 | 4.18978506688816e-06 | 0.00242784951547835 |
TM4SF20|79853 | 0.4593 | 2.99552365987665e-05 | 0.0098202627745982 |
HRG|3273 | 0.4411 | 1.26646143261411e-06 | 0.00102104323238971 |
CRP|1401 | 0.4251 | 6.19478691676889e-06 | 0.00315492977913851 |
SSTR5|6755 | 0.4182 | 7.53746131945121e-05 | 0.01941210350647 |
ENPP7|339221 | 0.416 | 2.49461317487487e-05 | 0.00856622446327866 |
GTSF1L|149699 | 0.4065 | 3.38368747199169e-06 | 0.00216357644114283 |
FOXE1|2304 | 0.399 | 1.3156159194061e-05 | 0.00519052467947815 |
CD300E|342510 | 0.3954 | 1.0235691684457e-05 | 0.00441396350941597 |
EN1|2019 | 0.3938 | 3.8699899094663e-05 | 0.0115743907889086 |
KNG1|3827 | 0.3794 | 4.09914004606549e-06 | 0.00242784951547835 |
LOC285375|285375 | 0.3682 | 0.00171242044508668 | 0.147030362867385 |
GPR123|84435 | 0.3602 | 0.000336499822037872 | 0.052434589916372 |
HBG1|3047 | 0.3575 | 0.000905282095196469 | 0.102357596897732 |
C1QL2|165257 | 0.3567 | 0.00194917231931901 | 0.157145662248402 |
ADH7|131 | 0.3561 | 0.000297572100104837 | 0.0479815604542956 |
PAX7|5081 | 0.3549 | 0.00298293561193885 | 0.204105442996982 |
NLRP4|147945 | 0.3532 | 0.000143184802166907 | 0.0291766569953951 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.