This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.0901, -0.0555, -0.0299, -0.0076, 0.0129, 0.0343, 0.0565, 0.0819, 0.1179, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 178 | 501 | 178 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
TSPYL6|388951 | 0.5772 | 3.16434416447464e-08 | 0.000585846678610835 |
C11orf86|254439 | 0.4769 | 2.61463648874027e-05 | 0.0691533999321963 |
SLC10A1|6554 | 0.4138 | 2.28025533215437e-05 | 0.0691533999321963 |
FAM75C1|441452 | 0.4086 | 0.000885177113341795 | 0.372458388100227 |
ATP4B|496 | 0.4056 | 0.000802346794357867 | 0.372378037004076 |
ALPP|250 | 0.3523 | 7.41261393644166e-06 | 0.0410068770173548 |
OLFM4|10562 | 0.3471 | 4.32222384172398e-06 | 0.0400108261028389 |
C11orf85|283129 | 0.3425 | 0.00643617278344699 | 0.57937921309843 |
CYP4F8|11283 | 0.3382 | 0.0152177543398606 | 0.656740102210208 |
CGA|1081 | 0.3333 | 0.00236277267216645 | 0.492232359473852 |
C20orf141|128653 | 0.3273 | 0.00344135298672987 | 0.544557343558264 |
INHA|3623 | 0.3271 | 8.85964718966292e-06 | 0.0410068770173548 |
AMAC1|146861 | 0.3263 | 0.00849798649148492 | 0.591472638734405 |
TRIM17|51127 | 0.3156 | 1.77367862410627e-05 | 0.0656757720934069 |
C8G|733 | 0.313 | 3.8124254579941e-05 | 0.0882290561616283 |
LCE5A|254910 | 0.3036 | 0.0100476233776328 | 0.614590738754272 |
OR1F1|4992 | 0.3022 | 0.00291919812566244 | 0.510456062793209 |
CNTD2|79935 | 0.2988 | 6.85069099315427e-05 | 0.126833693047258 |
THRSP|7069 | 0.2988 | 0.00788721539538573 | 0.591472638734405 |
GPR78|27201 | 0.2968 | 0.0236579483031365 | 0.704713159984946 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.