This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.14214, -0.0981, -0.0672, -0.042, -0.0175, 0.006, 0.031, 0.0613, 0.10194, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 194 | 304 | 193 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
MYL7|58498 | 0.4827 | 6.64923105020421e-06 | 0.0126330189235998 |
PRSS37|136242 | 0.4726 | 0.000137425189183471 | 0.0423618968759286 |
MAGEA4|4103 | 0.3787 | 0.00029862119846058 | 0.0622802215913949 |
MAGEA8|4107 | 0.3623 | 4.78680021336331e-05 | 0.0262088296248366 |
PRTN3|5657 | 0.3622 | 0.00327062059122873 | 0.151741777277237 |
ADCYAP1R1|117 | 0.3582 | 0.00292085363789241 | 0.147559903089859 |
GJB7|375519 | 0.3546 | 1.59714488345841e-05 | 0.0156125478889854 |
C3orf49|132200 | 0.3518 | 0.00628123699036576 | 0.18666153128069 |
SSX4|6759 | 0.326 | 0.00627088193478142 | 0.186658804824664 |
PGPEP1L|145814 | 0.3202 | 9.11345316541023e-05 | 0.0352651856849608 |
TAS2R13|50838 | 0.3051 | 0.0187970016258188 | 0.274399823538777 |
PDP1|54704 | 0.3041 | 1.70771108289891e-05 | 0.0156125478889854 |
KIR3DL3|115653 | 0.304 | 0.00210728919112269 | 0.132829111132642 |
TRIM17|51127 | 0.2992 | 2.37569987013853e-05 | 0.0180028556409206 |
C1orf49|84066 | 0.2983 | 0.0115214676572508 | 0.235404592024457 |
RFPL4A|342931 | 0.2972 | 0.0106795126898511 | 0.226947185829069 |
COX8C|341947 | 0.2887 | 0.00851813236342114 | 0.208505078456986 |
GBP5|115362 | 0.2879 | 4.89965473824405e-05 | 0.0262088296248366 |
LOC100271832|100271832 | 0.2876 | 0.0150117184467025 | 0.254487010767607 |
P2RY6|5031 | 0.2831 | 6.63859744596262e-05 | 0.0301840429374306 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.