Correlations between APOBEC_MutLoad_MinEstimate and mRNAseq expression
Overview
Introduction

This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.

Summary

The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0901, -0.0555, -0.0299, -0.0076, 0.0129, 0.0343, 0.0565, 0.0819, 0.1179, respectively.

Results
Correlation results

Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.

Table 1.  Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both

Category APOBEC_MutLoad_MinEstimate Expression Common
Sample 178 501 178

Figure 1.  Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2.  Get Full Table Top 20 genes ranked by correlation coefficients

geneID cor p-value q-value
TSPYL6|388951 0.5772 3.16434416447464e-08 0.000585846678610835
C11orf86|254439 0.4769 2.61463648874027e-05 0.0691533999321963
SLC10A1|6554 0.4138 2.28025533215437e-05 0.0691533999321963
FAM75C1|441452 0.4086 0.000885177113341795 0.372458388100227
ATP4B|496 0.4056 0.000802346794357867 0.372378037004076
ALPP|250 0.3523 7.41261393644166e-06 0.0410068770173548
OLFM4|10562 0.3471 4.32222384172398e-06 0.0400108261028389
C11orf85|283129 0.3425 0.00643617278344699 0.57937921309843
CYP4F8|11283 0.3382 0.0152177543398606 0.656740102210208
CGA|1081 0.3333 0.00236277267216645 0.492232359473852
C20orf141|128653 0.3273 0.00344135298672987 0.544557343558264
INHA|3623 0.3271 8.85964718966292e-06 0.0410068770173548
AMAC1|146861 0.3263 0.00849798649148492 0.591472638734405
TRIM17|51127 0.3156 1.77367862410627e-05 0.0656757720934069
C8G|733 0.313 3.8124254579941e-05 0.0882290561616283
LCE5A|254910 0.3036 0.0100476233776328 0.614590738754272
OR1F1|4992 0.3022 0.00291919812566244 0.510456062793209
CNTD2|79935 0.2988 6.85069099315427e-05 0.126833693047258
THRSP|7069 0.2988 0.00788721539538573 0.591472638734405
GPR78|27201 0.2968 0.0236579483031365 0.704713159984946
Methods & Data
Input

Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.

Correlation across sample

Pearson correlation with pairwise.complete.obs was used to do this analysis.