This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0882, -0.059, -0.0376, -0.0193, -0.0015, 0.0169, 0.0366, 0.0601, 0.0934, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 511 | 520 | 503 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
GAGE4|2576 | 0.3557 | 6.90672372849477e-06 | 0.00275938627570167 |
CPN1|1369 | 0.3265 | 1.56144580212292e-05 | 0.00407112590048528 |
RAX|30062 | 0.2944 | 4.60858122020014e-06 | 0.0021353449676404 |
KIR2DL1|3802 | 0.2593 | 1.07143369818541e-05 | 0.00312552515956372 |
TCL1B|9623 | 0.257 | 0.000371209246404325 | 0.0206729803952081 |
LTA4H|4048 | 0.2569 | 5.04571251447317e-09 | 9.27301045909878e-05 |
MAGEA1|4100 | 0.2528 | 6.26344467222761e-05 | 0.00844581122176194 |
?|340602 | 0.2508 | 0.000761353931289399 | 0.0305669812278026 |
PTCHD3|374308 | 0.2496 | 0.00131523262413058 | 0.038731032034741 |
GOLGA6L1|283767 | 0.2458 | 2.21354630847159e-07 | 0.000581150772244156 |
CTAG2|30848 | 0.2375 | 0.000939778129957602 | 0.0338651813183545 |
CT45A5|441521 | 0.2366 | 0.00147750845544659 | 0.0420761123405153 |
GLDC|2731 | 0.2334 | 1.23103093274679e-07 | 0.000581150772244156 |
C1orf126|200197 | 0.2326 | 1.43737367563901e-07 | 0.000581150772244156 |
TRIM15|89870 | 0.2324 | 1.24547276270093e-06 | 0.00104042265604171 |
KLRC2|3822 | 0.2305 | 3.49768792595739e-07 | 0.0007142278744805 |
CDCP1|64866 | 0.2299 | 1.86286452930062e-07 | 0.000581150772244156 |
FAM75C1|441452 | 0.2297 | 0.0026603650557937 | 0.0549968380150468 |
MAGEA4|4103 | 0.2288 | 3.27932536414721e-06 | 0.00177257181006757 |
DPPA2|151871 | 0.228 | 0.00341774010805995 | 0.0623930335252282 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.