This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.1698, -0.1114, -0.06929, -0.0321, 0.0026, 0.0386, 0.07809, 0.1244, 0.1873, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 185 | 184 | 184 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
OR2W3|343171 | 0.4843 | 2.7538493707624e-08 | 7.38346357006695e-05 |
RNF11|26994 | 0.4359 | 6.20910434179223e-10 | 3.88441567622522e-06 |
CYB5R1|51706 | 0.4251 | 1.79783521403465e-09 | 8.43544282425057e-06 |
CAPZB|832 | 0.4076 | 9.33630506239069e-09 | 3.50447546821897e-05 |
IFNK|56832 | 0.4035 | 3.69880425088454e-06 | 0.000701203617985868 |
PRB1|5542 | 0.3984 | 0.00122289206215487 | 0.0167405092797393 |
ASB15|142685 | 0.39 | 0.00131964431043574 | 0.0175156184004653 |
TMEM79|84283 | 0.3861 | 6.18746489600142e-08 | 0.000144994727327467 |
HSPA2|3306 | 0.3847 | 6.95307196263428e-08 | 0.000144994727327467 |
PLEKHG5|57449 | 0.3823 | 8.5200481780845e-08 | 0.000153273475794565 |
C1orf74|148304 | 0.3817 | 8.9834198302441e-08 | 0.000153273475794565 |
S100A11|6282 | 0.3743 | 1.661087956073e-07 | 0.000211094856358329 |
CITED4|163732 | 0.3741 | 1.68713919723729e-07 | 0.000211094856358329 |
FAM178B|51252 | 0.3732 | 2.46857399854861e-07 | 0.000243843141077686 |
SLC38A2|54407 | 0.371 | 2.17428239190909e-07 | 0.000240040776066763 |
FOLR3|2352 | 0.367 | 3.2072548125095e-05 | 0.00194173413939285 |
APOBEC3A|200315 | 0.3669 | 3.50869431731127e-07 | 0.000278904205873213 |
KIAA1609|57707 | 0.3664 | 3.12661796009195e-07 | 0.000278904205873213 |
MMP28|79148 | 0.3659 | 3.25488486829784e-07 | 0.000278904205873213 |
MAP7D1|55700 | 0.3649 | 3.52045949281177e-07 | 0.000278904205873213 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.