This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.1399, -0.0963, -0.0656, -0.0369, -0.0086, 0.0194, 0.0497, 0.08814, 0.14597, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 578 | 599 | 554 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
TMEM79|84283 | 0.3911 | 0 | 0 |
SLC38A2|54407 | 0.366 | 0 | 0 |
KRT1|3848 | 0.3654 | 8.88178419700125e-16 | 4.2687221279126e-13 |
C1orf74|148304 | 0.3643 | 0 | 0 |
GJB2|2706 | 0.3616 | 0 | 0 |
CAPNS2|84290 | 0.3598 | 1.82076576038526e-14 | 4.01510981325427e-12 |
KIAA1609|57707 | 0.3505 | 0 | 0 |
PERP|64065 | 0.3459 | 0 | 0 |
LGALS7|3963 | 0.3453 | 8.88178419700125e-16 | 4.2687221279126e-13 |
KRT14|3861 | 0.3437 | 2.59792187762287e-14 | 5.39688440457522e-12 |
PLS3|5358 | 0.3437 | 0 | 0 |
KRT16|3868 | 0.3434 | 4.44089209850063e-16 | 2.52242671194836e-13 |
DSC1|1823 | 0.3418 | 4.93161067538495e-13 | 5.70605620366762e-11 |
IGFL1|374918 | 0.3407 | 6.79190037544686e-12 | 5.46383607885733e-10 |
FAT2|2196 | 0.3404 | 2.22044604925031e-16 | 1.80956698900643e-13 |
APOBEC3A|200315 | 0.3397 | 4.44089209850063e-16 | 2.52242671194836e-13 |
PTHLH|5744 | 0.3397 | 2.22044604925031e-16 | 1.80956698900643e-13 |
KRT75|9119 | 0.3394 | 6.00298610820005e-09 | 1.83556234277491e-07 |
IL20RB|53833 | 0.3393 | 2.22044604925031e-16 | 1.80956698900643e-13 |
LYPD3|27076 | 0.3385 | 2.22044604925031e-16 | 1.80956698900643e-13 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.