This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.14214, -0.0984, -0.06712, -0.0422, -0.0178, 0.0059, 0.031, 0.061, 0.101, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 194 | 304 | 193 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
MYL7|58498 | 0.4852 | 5.84671357772848e-06 | 0.0118149088709053 |
PRSS37|136242 | 0.4703 | 0.000149425389998292 | 0.0438322510951442 |
MAGEA4|4103 | 0.3741 | 0.000357731367427894 | 0.0625846835070673 |
ADCYAP1R1|117 | 0.363 | 0.00253461389236964 | 0.14133976848833 |
PRTN3|5657 | 0.3625 | 0.0032438936092527 | 0.152677746344702 |
MAGEA8|4107 | 0.3612 | 5.04354682537844e-05 | 0.0269785253273993 |
C3orf49|132200 | 0.3567 | 0.00554816038790729 | 0.181668944913246 |
GJB7|375519 | 0.3537 | 1.69315037301665e-05 | 0.0153448213690089 |
SSX4|6759 | 0.3275 | 0.006013199160531 | 0.187033783856913 |
PGPEP1L|145814 | 0.32 | 9.23039321381047e-05 | 0.0349735752874106 |
TAS2R13|50838 | 0.3081 | 0.0176144985265181 | 0.273302964548592 |
PDP1|54704 | 0.3064 | 1.46900584154785e-05 | 0.0153448213690089 |
KIR3DL3|115653 | 0.3034 | 0.00214751446585359 | 0.135928921714178 |
RFPL4A|342931 | 0.3009 | 0.0096820305522165 | 0.222383736120283 |
TRIM17|51127 | 0.2968 | 2.77203142649007e-05 | 0.0195251528621723 |
LOC100271832|100271832 | 0.2881 | 0.0148331201626304 | 0.256191791450863 |
GBP5|115362 | 0.2831 | 6.62581840529253e-05 | 0.0314674224439581 |
COX8C|341947 | 0.2816 | 0.0103872067642137 | 0.226986527155563 |
P2RY6|5031 | 0.2816 | 7.28361894841711e-05 | 0.0327766360846321 |
C1orf49|84066 | 0.2805 | 0.0178281335883055 | 0.273357709320204 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.