This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.08786, -0.0564, -0.0343, -0.01464, 0.0046, 0.02354, 0.0445, 0.0688, 0.1022, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 395 | 408 | 391 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
OR52N2|390077 | 0.3681 | 6.10396429978621e-06 | 0.00585177419582136 |
KIR2DS4|3809 | 0.3598 | 7.56890834452406e-08 | 0.000689338327477529 |
KIR3DL1|3811 | 0.3408 | 2.67395309716534e-06 | 0.00401560323703896 |
PRSS41|360226 | 0.3355 | 4.19513075264888e-05 | 0.0155947564611223 |
COX7B2|170712 | 0.3091 | 9.58805802802054e-05 | 0.0203077298814412 |
SSX4|6759 | 0.3031 | 0.000154988847433257 | 0.0247642268069893 |
KIR2DL3|3804 | 0.2976 | 8.19794125450635e-06 | 0.00678752272503787 |
TTPA|7274 | 0.2975 | 2.24549485006165e-06 | 0.00401560323703896 |
KLRC4|8302 | 0.2894 | 1.78125919383376e-05 | 0.00983201097444908 |
SSX6|280657 | 0.2794 | 0.000696487314185923 | 0.0428356936417541 |
TUBA3C|7278 | 0.2784 | 0.000697874079978789 | 0.0428356936417541 |
KIAA1841|84542 | 0.2753 | 3.11988452850187e-08 | 0.000568286966866616 |
KIR2DL1|3802 | 0.272 | 0.000425241442910451 | 0.0364980540676649 |
GPR12|2835 | 0.2693 | 0.000115563702312249 | 0.0211532868231449 |
CLNK|116449 | 0.2637 | 0.000119005385876392 | 0.0214622089479057 |
KLRC3|3823 | 0.2635 | 3.08638184565169e-06 | 0.00401560323703896 |
RFPL4B|442247 | 0.26 | 0.000834390042969702 | 0.0464783322100707 |
IFNG|3458 | 0.2581 | 3.3423900167584e-06 | 0.00405877561035028 |
APOC1|341 | 0.2475 | 7.2217283220155e-07 | 0.00328859453463781 |
CSAG3|389903 | 0.2463 | 3.00853115065003e-06 | 0.00401560323703896 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.