This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.10232, -0.0682, -0.0421, -0.0203, 0.0011, 0.0232, 0.0463, 0.07454, 0.1143, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 533 | 515 | 478 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
FBXO45|200933 | 0.3073 | 6.55764331725095e-12 | 6.59729739727055e-08 |
C13orf34|79866 | 0.3067 | 7.20268289455817e-12 | 6.59729739727055e-08 |
RNF219|79596 | 0.3004 | 2.01083594220108e-11 | 1.22788345417272e-07 |
PNMA5|114824 | 0.2896 | 5.39717825986941e-09 | 7.54235519878262e-06 |
SIM1|6492 | 0.2876 | 6.59030846528186e-05 | 0.00234423030632036 |
SPHKAP|80309 | 0.2849 | 1.7051355865938e-05 | 0.000882383582226322 |
IL20RB|53833 | 0.2792 | 5.2491055946291e-10 | 2.40395913470026e-06 |
FOXG1|2290 | 0.2763 | 0.000124080473687282 | 0.00352954999608281 |
PAK2|5062 | 0.2762 | 8.15381984153873e-10 | 2.53996227883633e-06 |
SIAH2|6478 | 0.2755 | 9.02268926239458e-10 | 2.53996227883633e-06 |
LOC100130386|100130386 | 0.273 | 0.000343789971467112 | 0.00718117273353025 |
TBL1XR1|79718 | 0.267 | 3.0317259813728e-09 | 5.55381882527684e-06 |
MAGEA6|4105 | 0.2636 | 1.32837173159039e-05 | 0.000737407325788013 |
MYO18B|84700 | 0.2633 | 6.03048508462933e-07 | 0.000108188530717285 |
ZNF675|171392 | 0.2633 | 5.07515740544306e-09 | 7.54235519878262e-06 |
TAC3|6866 | 0.2628 | 0.000140380014420405 | 0.00386709997619157 |
ZNF200|7752 | 0.2625 | 5.67391555961194e-09 | 7.54235519878262e-06 |
C18orf2|56651 | 0.2624 | 2.10560656133119e-05 | 0.00103135311756754 |
WDR53|348793 | 0.2619 | 6.17584627882195e-09 | 7.54235519878262e-06 |
ADAM21P1|145241 | 0.2615 | 2.04263919401093e-06 | 0.000220112396441684 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.