Correlations between APOBEC_MutLoad_MinEstimate and mRNAseq expression
Overview
Introduction

This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.

Summary

The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0969, -0.0672, -0.0458, -0.0256, -0.0072, 0.0123, 0.0337, 0.0583, 0.0912, respectively.

Results
Correlation results

Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.

Table 1.  Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both

Category APOBEC_MutLoad_MinEstimate Expression Common
Sample 496 501 490

Figure 1.  Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2.  Get Full Table Top 20 genes ranked by correlation coefficients

geneID cor p-value q-value
C19orf30|284424 0.2996 0.000160091990133449 0.0150002859505247
CCL1|6346 0.2995 5.65642783620923e-05 0.0087343957313152
C4orf17|84103 0.2723 0.00038586307187094 0.0214090315973816
IL1F7|27178 0.265 0.000149455708633228 0.0146125445560423
CYP2C9|1559 0.2496 0.00122584038559381 0.0379100671899525
PSG6|5675 0.2485 0.000584068984720965 0.0262029951000752
CEACAM8|1088 0.2479 0.000250374843245638 0.017748007449079
RETN|56729 0.2422 3.91832670065817e-06 0.00237148529631148
IL1F9|56300 0.2382 0.00200035313776015 0.047411060142106
CAPN2|824 0.2363 1.20891728627726e-07 0.000543710549503196
AZU1|566 0.2307 4.37725124169397e-06 0.00246083593243983
IFNB1|3456 0.23 0.00179067891770845 0.0440293613802182
CKLF|51192 0.2283 3.26873343059475e-07 0.000980075240273326
GABRA1|2554 0.2267 3.13556092081946e-05 0.00714034695766355
FAM163B|642968 0.225 0.00387739465356907 0.0650962987670342
CCL24|6369 0.2196 0.00156083418062947 0.0414762288176132
SLC20A1|6574 0.2178 1.13186721684144e-06 0.00196856583113628
CACYBP|27101 0.2155 1.46597339689691e-06 0.00196856583113628
FCRLB|127943 0.2149 1.58577323894349e-06 0.00196856583113628
ADRA1A|148 0.2148 2.88223659161702e-05 0.00691352483775868
Methods & Data
Input

Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.

Correlation across sample

Pearson correlation with pairwise.complete.obs was used to do this analysis.