This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0969, -0.0672, -0.0458, -0.0256, -0.0072, 0.0123, 0.0337, 0.0583, 0.0912, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 496 | 501 | 490 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
C19orf30|284424 | 0.2996 | 0.000160091990133449 | 0.0150002859505247 |
CCL1|6346 | 0.2995 | 5.65642783620923e-05 | 0.0087343957313152 |
C4orf17|84103 | 0.2723 | 0.00038586307187094 | 0.0214090315973816 |
IL1F7|27178 | 0.265 | 0.000149455708633228 | 0.0146125445560423 |
CYP2C9|1559 | 0.2496 | 0.00122584038559381 | 0.0379100671899525 |
PSG6|5675 | 0.2485 | 0.000584068984720965 | 0.0262029951000752 |
CEACAM8|1088 | 0.2479 | 0.000250374843245638 | 0.017748007449079 |
RETN|56729 | 0.2422 | 3.91832670065817e-06 | 0.00237148529631148 |
IL1F9|56300 | 0.2382 | 0.00200035313776015 | 0.047411060142106 |
CAPN2|824 | 0.2363 | 1.20891728627726e-07 | 0.000543710549503196 |
AZU1|566 | 0.2307 | 4.37725124169397e-06 | 0.00246083593243983 |
IFNB1|3456 | 0.23 | 0.00179067891770845 | 0.0440293613802182 |
CKLF|51192 | 0.2283 | 3.26873343059475e-07 | 0.000980075240273326 |
GABRA1|2554 | 0.2267 | 3.13556092081946e-05 | 0.00714034695766355 |
FAM163B|642968 | 0.225 | 0.00387739465356907 | 0.0650962987670342 |
CCL24|6369 | 0.2196 | 0.00156083418062947 | 0.0414762288176132 |
SLC20A1|6574 | 0.2178 | 1.13186721684144e-06 | 0.00196856583113628 |
CACYBP|27101 | 0.2155 | 1.46597339689691e-06 | 0.00196856583113628 |
FCRLB|127943 | 0.2149 | 1.58577323894349e-06 | 0.00196856583113628 |
ADRA1A|148 | 0.2148 | 2.88223659161702e-05 | 0.00691352483775868 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.