This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0886, -0.0593, -0.0375, -0.0192, -0.0013, 0.017, 0.03689, 0.0603, 0.09273, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 510 | 520 | 502 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
GAGE4|2576 | 0.3616 | 6.31556404928624e-06 | 0.00218867555570412 |
RAX|30062 | 0.3038 | 2.421508061623e-06 | 0.00148570308619631 |
PTCHD3|374308 | 0.2981 | 0.000111302217288944 | 0.0104362864762052 |
CPN1|1369 | 0.2964 | 0.000105253721326637 | 0.0102884046407866 |
WDR87|83889 | 0.2582 | 0.000215163202629043 | 0.0148099975202867 |
C1orf126|200197 | 0.2476 | 2.09640704795788e-08 | 0.0001301050076657 |
LTA4H|4048 | 0.246 | 2.35184123287269e-08 | 0.0001301050076657 |
KIR2DL1|3802 | 0.2431 | 3.91976157509966e-05 | 0.00610486256162555 |
CDCP1|64866 | 0.2368 | 7.96948989023605e-08 | 0.000209233264575369 |
KLRC2|3822 | 0.2331 | 2.55950215466427e-07 | 0.000414163600778917 |
KYNU|8942 | 0.2321 | 1.44266028456741e-07 | 0.000294591230108665 |
CT45A5|441521 | 0.2313 | 0.00201273761640564 | 0.0501899687041702 |
?|340602 | 0.2301 | 0.0024657148843743 | 0.0559126054378087 |
GOLGA6L1|283767 | 0.2299 | 1.29231322532064e-06 | 0.00131945180305237 |
CLEC7A|64581 | 0.2297 | 1.95423442539422e-07 | 0.00035914920269895 |
MAGEA4|4103 | 0.2283 | 3.44356890424891e-06 | 0.00158214773305716 |
KIR2DL4|3805 | 0.2276 | 4.81650013428592e-07 | 0.000590117596452711 |
GLDC|2731 | 0.2263 | 3.06303240549255e-07 | 0.000433018534985708 |
CALCA|796 | 0.2235 | 0.00449246621456134 | 0.0755301676408008 |
SHISA5|51246 | 0.2172 | 8.9825179294678e-07 | 0.000971063026516231 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.