Correlations between APOBEC_MutLoad_MinEstimate and mRNAseq expression
Overview
Introduction

This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.

Summary

The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.101, -0.0689, -0.0456, -0.0258, -0.0076, 0.0107, 0.03, 0.0526, 0.0834, respectively.

Results
Correlation results

Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.

Table 1.  Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both

Category APOBEC_MutLoad_MinEstimate Expression Common
Sample 248 545 247

Figure 1.  Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2.  Get Full Table Top 20 genes ranked by correlation coefficients

geneID cor p-value q-value
CELA1|1990 0.7329 5.46007683510652e-13 5.06231023766901e-09
MAGEE2|139599 0.4851 6.29405611629608e-07 0.000583553412822391
EMX1|2016 0.4636 4.18978506688816e-06 0.00242784951547835
TM4SF20|79853 0.4593 2.99552365987665e-05 0.0098202627745982
HRG|3273 0.4411 1.26646143261411e-06 0.00102104323238971
CRP|1401 0.4251 6.19478691676889e-06 0.00315492977913851
SSTR5|6755 0.4182 7.53746131945121e-05 0.01941210350647
ENPP7|339221 0.416 2.49461317487487e-05 0.00856622446327866
GTSF1L|149699 0.4065 3.38368747199169e-06 0.00216357644114283
FOXE1|2304 0.399 1.3156159194061e-05 0.00519052467947815
CD300E|342510 0.3954 1.0235691684457e-05 0.00441396350941597
EN1|2019 0.3938 3.8699899094663e-05 0.0115743907889086
KNG1|3827 0.3794 4.09914004606549e-06 0.00242784951547835
LOC285375|285375 0.3682 0.00171242044508668 0.147030362867385
GPR123|84435 0.3602 0.000336499822037872 0.052434589916372
HBG1|3047 0.3575 0.000905282095196469 0.102357596897732
C1QL2|165257 0.3567 0.00194917231931901 0.157145662248402
ADH7|131 0.3561 0.000297572100104837 0.0479815604542956
PAX7|5081 0.3549 0.00298293561193885 0.204105442996982
NLRP4|147945 0.3532 0.000143184802166907 0.0291766569953951
Methods & Data
Input

Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.

Correlation across sample

Pearson correlation with pairwise.complete.obs was used to do this analysis.