This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.1028, -0.0685, -0.0426, -0.0206, 0.0011, 0.0233, 0.0464, 0.075, 0.1152, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 533 | 515 | 478 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
C13orf34|79866 | 0.3093 | 4.65094629475971e-12 | 5.33958461890194e-08 |
FBXO45|200933 | 0.308 | 5.82955905770177e-12 | 5.33958461890194e-08 |
RNF219|79596 | 0.301 | 1.81747950023237e-11 | 1.10981356549189e-07 |
SIM1|6492 | 0.2926 | 4.83646181610453e-05 | 0.00178267895390783 |
PNMA5|114824 | 0.2874 | 7.15645187554514e-09 | 7.74373391554395e-06 |
IL20RB|53833 | 0.2809 | 4.03600930454218e-10 | 1.8483913612477e-06 |
SPHKAP|80309 | 0.28 | 2.40952751846546e-05 | 0.00110989520062473 |
PAK2|5062 | 0.2765 | 7.80939313216322e-10 | 2.38433787980163e-06 |
SIAH2|6478 | 0.2751 | 9.50584055914305e-10 | 2.48767847432774e-06 |
FOXG1|2290 | 0.2738 | 0.000143770948294986 | 0.00380598266158359 |
TAC3|6866 | 0.2714 | 8.25541619990933e-05 | 0.0025588996508653 |
LOC100130386|100130386 | 0.2705 | 0.000391302644334068 | 0.00745142738207464 |
MAGEA6|4105 | 0.2681 | 9.28861340598885e-06 | 0.000559422827948153 |
TBL1XR1|79718 | 0.2655 | 3.75794817486508e-09 | 6.88418526153534e-06 |
MYO18B|84700 | 0.2653 | 4.90975794020798e-07 | 8.73221900064757e-05 |
ADAM21P1|145241 | 0.2652 | 1.43647370598465e-06 | 0.00016243680135761 |
C18orf2|56651 | 0.2649 | 1.75202592540558e-05 | 0.000857967434898444 |
ZNF675|171392 | 0.2637 | 4.7679415970947e-09 | 6.97312305741882e-06 |
ZNF200|7752 | 0.2633 | 5.03707542343079e-09 | 6.97312305741882e-06 |
WDR53|348793 | 0.2629 | 5.32909671946413e-09 | 6.97312305741882e-06 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.