This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.1399, -0.0963, -0.0656, -0.0369, -0.0086, 0.0194, 0.0497, 0.08814, 0.14597, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
| Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
|---|---|---|---|
| Sample | 578 | 599 | 554 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.
Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
| geneID | cor | p-value | q-value |
|---|---|---|---|
| TMEM79|84283 | 0.3911 | 0 | 0 |
| SLC38A2|54407 | 0.366 | 0 | 0 |
| KRT1|3848 | 0.3654 | 8.88178419700125e-16 | 4.2687221279126e-13 |
| C1orf74|148304 | 0.3643 | 0 | 0 |
| GJB2|2706 | 0.3616 | 0 | 0 |
| CAPNS2|84290 | 0.3598 | 1.82076576038526e-14 | 4.01510981325427e-12 |
| KIAA1609|57707 | 0.3505 | 0 | 0 |
| PERP|64065 | 0.3459 | 0 | 0 |
| LGALS7|3963 | 0.3453 | 8.88178419700125e-16 | 4.2687221279126e-13 |
| KRT14|3861 | 0.3437 | 2.59792187762287e-14 | 5.39688440457522e-12 |
| PLS3|5358 | 0.3437 | 0 | 0 |
| KRT16|3868 | 0.3434 | 4.44089209850063e-16 | 2.52242671194836e-13 |
| DSC1|1823 | 0.3418 | 4.93161067538495e-13 | 5.70605620366762e-11 |
| IGFL1|374918 | 0.3407 | 6.79190037544686e-12 | 5.46383607885733e-10 |
| FAT2|2196 | 0.3404 | 2.22044604925031e-16 | 1.80956698900643e-13 |
| APOBEC3A|200315 | 0.3397 | 4.44089209850063e-16 | 2.52242671194836e-13 |
| PTHLH|5744 | 0.3397 | 2.22044604925031e-16 | 1.80956698900643e-13 |
| KRT75|9119 | 0.3394 | 6.00298610820005e-09 | 1.83556234277491e-07 |
| IL20RB|53833 | 0.3393 | 2.22044604925031e-16 | 1.80956698900643e-13 |
| LYPD3|27076 | 0.3385 | 2.22044604925031e-16 | 1.80956698900643e-13 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.