This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0945, -0.0647, -0.04202, -0.0244, -0.0071, 0.0107, 0.0306, 0.05348, 0.08384, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 290 | 367 | 290 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
AHSG|197 | 0.3797 | 3.64947199917864e-05 | 0.0549458421743005 |
PRB2|653247 | 0.351 | 0.000523283548035547 | 0.163213817218766 |
AMY1A|276 | 0.3382 | 3.11037913647283e-07 | 0.00280976099293273 |
PRB1|5542 | 0.3228 | 0.00111732722117397 | 0.225294741216969 |
PARP6|56965 | 0.3195 | 2.62622421587366e-08 | 0.000474479929081895 |
LOC153328|153328 | 0.3076 | 0.000969013082315229 | 0.213501943392552 |
ITLN2|142683 | 0.2689 | 0.000118288649798126 | 0.0926222638907202 |
GAGE12J|729396 | 0.2618 | 0.00372294721519539 | 0.344428288745074 |
PRB3|5544 | 0.2617 | 0.00647732198136652 | 0.423185745506196 |
ITGB1BP1|9270 | 0.2557 | 1.03413674445996e-05 | 0.0311395809369301 |
HAO2|51179 | 0.2515 | 0.00068438386661196 | 0.180004570286463 |
APOA2|336 | 0.2509 | 0.0136709041178493 | 0.491038220073924 |
LOC285033|285033 | 0.2475 | 2.01296175987853e-05 | 0.0398077669095581 |
OR7E5P|219445 | 0.2434 | 0.0193642893335331 | 0.540735931249808 |
FGF5|2250 | 0.2424 | 0.00104218865985661 | 0.213968437700334 |
RPL23AP82|284942 | 0.2412 | 3.29491461945608e-05 | 0.0541174749361027 |
CALHM3|119395 | 0.236 | 0.0153427263015629 | 0.507454986682769 |
NBPF4|148545 | 0.2358 | 0.000318022438845489 | 0.145724059006324 |
VIL1|7429 | 0.2331 | 0.0172582734665108 | 0.515380540032149 |
ADK|132 | 0.2329 | 6.23935892352101e-05 | 0.0751509984475027 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.