This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0564, -0.0395, -0.0259, -0.0141, -0.0035, 0.0069, 0.018, 0.0314, 0.0498, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 978 | 1093 | 974 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
HFE2|148738 | 0.3074 | 2.43114595122051e-09 | 2.22401231617653e-05 |
CACNA1S|779 | 0.268 | 2.91497706417232e-06 | 0.00280696949295246 |
ABRA|137735 | 0.2343 | 1.00563434024536e-07 | 0.000229988573614115 |
TBX10|347853 | 0.1925 | 0.000309245891950916 | 0.0463767445830652 |
PROKR1|10887 | 0.1918 | 2.37452006577321e-05 | 0.00868884382467734 |
OR1Q1|158131 | 0.1863 | 0.000574190064553726 | 0.0659842181582006 |
CHRNA4|1137 | 0.182 | 0.000189695270141232 | 0.0324675201750846 |
SUSD2|56241 | 0.179 | 1.86831059512116e-08 | 8.54565266208418e-05 |
CDH10|1008 | 0.1753 | 0.000315819193703604 | 0.0465986126451704 |
ASB5|140458 | 0.1752 | 0.00130308763832487 | 0.0996150070466432 |
SNX11|29916 | 0.1721 | 6.5118462622138e-08 | 0.000170201056019234 |
OVCH1|341350 | 0.1719 | 0.00155976586092477 | 0.108096500725301 |
FBXO40|51725 | 0.1678 | 0.000383190373016262 | 0.0535179470588209 |
LOC400891|400891 | 0.1649 | 0.00162179682096375 | 0.110306299763393 |
MYL2|4633 | 0.1638 | 0.0012057557025793 | 0.0955548580695832 |
KCNT1|57582 | 0.1614 | 6.24746180877267e-06 | 0.00394150211218292 |
TEX11|56159 | 0.1576 | 3.53917629292155e-06 | 0.0030834652121568 |
SLC10A1|6554 | 0.1574 | 0.00140275816386604 | 0.103151994109108 |
TMC1|117531 | 0.1573 | 3.55047654312379e-05 | 0.0111999170401712 |
SEPX1|51734 | 0.156 | 1.00074233921887e-06 | 0.00143152409696613 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.