This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0567, -0.0398, -0.026, -0.0142, -0.0036, 0.007, 0.0182, 0.0315, 0.0499, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 978 | 1093 | 974 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
HFE2|148738 | 0.3017 | 4.92876051083613e-09 | 4.50883011531289e-05 |
CACNA1S|779 | 0.266 | 3.46300125286803e-06 | 0.00289035671378529 |
ABRA|137735 | 0.2251 | 3.20085347116361e-07 | 0.000532389228258268 |
TBX10|347853 | 0.1932 | 0.000293478731580699 | 0.0431638279629016 |
PROKR1|10887 | 0.1926 | 2.19678084611985e-05 | 0.00803846047212176 |
OR1Q1|158131 | 0.1874 | 0.000533310389485564 | 0.0627924417907852 |
CHRNA4|1137 | 0.1835 | 0.000168260230674289 | 0.0292741389024907 |
SUSD2|56241 | 0.1796 | 1.65089473203039e-08 | 7.55119250430702e-05 |
CDH10|1008 | 0.177 | 0.00027573679993953 | 0.0413514794401118 |
SNX11|29916 | 0.1727 | 5.80909835790067e-08 | 0.000151833233651644 |
ASB5|140458 | 0.172 | 0.00160128154558414 | 0.106534716938209 |
OVCH1|341350 | 0.1719 | 0.00156694912835098 | 0.106487689515663 |
LOC400891|400891 | 0.1701 | 0.00114086424729853 | 0.091549352055149 |
FBXO40|51725 | 0.167 | 0.000410011890184858 | 0.0531349807878518 |
KCNT1|57582 | 0.1618 | 5.88433384707265e-06 | 0.00371240593331177 |
MYL2|4633 | 0.1593 | 0.00164242111564783 | 0.108482804086255 |
TMC1|117531 | 0.159 | 2.89909036019775e-05 | 0.00959964512502387 |
SLC10A1|6554 | 0.1579 | 0.00136000789368351 | 0.099134280569058 |
TEX11|56159 | 0.1579 | 3.38379499642549e-06 | 0.00289035671378529 |
SEPX1|51734 | 0.1563 | 9.474744027127e-07 | 0.00131686632153752 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.