Correlations between APOBEC_MutLoad_MinEstimate and mRNAseq expression
Overview
Introduction

This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.

Summary

The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0882, -0.059, -0.0376, -0.0193, -0.0015, 0.0169, 0.0366, 0.0601, 0.0934, respectively.

Results
Correlation results

Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.

Table 1.  Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both

Category APOBEC_MutLoad_MinEstimate Expression Common
Sample 511 520 503

Figure 1.  Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2.  Get Full Table Top 20 genes ranked by correlation coefficients

geneID cor p-value q-value
GAGE4|2576 0.3557 6.90672372849477e-06 0.00275938627570167
CPN1|1369 0.3265 1.56144580212292e-05 0.00407112590048528
RAX|30062 0.2944 4.60858122020014e-06 0.0021353449676404
KIR2DL1|3802 0.2593 1.07143369818541e-05 0.00312552515956372
TCL1B|9623 0.257 0.000371209246404325 0.0206729803952081
LTA4H|4048 0.2569 5.04571251447317e-09 9.27301045909878e-05
MAGEA1|4100 0.2528 6.26344467222761e-05 0.00844581122176194
?|340602 0.2508 0.000761353931289399 0.0305669812278026
PTCHD3|374308 0.2496 0.00131523262413058 0.038731032034741
GOLGA6L1|283767 0.2458 2.21354630847159e-07 0.000581150772244156
CTAG2|30848 0.2375 0.000939778129957602 0.0338651813183545
CT45A5|441521 0.2366 0.00147750845544659 0.0420761123405153
GLDC|2731 0.2334 1.23103093274679e-07 0.000581150772244156
C1orf126|200197 0.2326 1.43737367563901e-07 0.000581150772244156
TRIM15|89870 0.2324 1.24547276270093e-06 0.00104042265604171
KLRC2|3822 0.2305 3.49768792595739e-07 0.0007142278744805
CDCP1|64866 0.2299 1.86286452930062e-07 0.000581150772244156
FAM75C1|441452 0.2297 0.0026603650557937 0.0549968380150468
MAGEA4|4103 0.2288 3.27932536414721e-06 0.00177257181006757
DPPA2|151871 0.228 0.00341774010805995 0.0623930335252282
Methods & Data
Input

Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.

Correlation across sample

Pearson correlation with pairwise.complete.obs was used to do this analysis.