This pipeline attempts to calculate the pearson correlation between APOBEC_MutLoad_MinEstimate and mRnaseq data of each gene across samples to determine if the APOBEC_MutLoad_MinEstimate also result in differential expressions.
The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are -0.091, -0.0558, -0.03001, -0.0077, 0.0131, 0.03478, 0.0571, 0.08304, 0.1193, respectively.
Number of samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 genes ordered by the value of correlation coefficients.
Table 1. Counts of mRNAseq and number of samples in APOBEC_MutLoad_MinEstimate and expression data sets and common to both
Category | APOBEC_MutLoad_MinEstimate | Expression | Common |
---|---|---|---|
Sample | 177 | 501 | 177 |
Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 genes ranked by correlation coefficients
geneID | cor | p-value | q-value |
---|---|---|---|
TSPYL6|388951 | 0.5781 | 2.98381297447747e-08 | 0.000552423134094759 |
C11orf86|254439 | 0.4779 | 2.50474980545423e-05 | 0.0662470541402566 |
SLC10A1|6554 | 0.4159 | 2.05020617163321e-05 | 0.0632625284360286 |
FAM75C1|441452 | 0.4083 | 0.000894303331782531 | 0.367936264102706 |
ATP4B|496 | 0.4056 | 0.000801333902122892 | 0.362553969104832 |
ALPP|250 | 0.3571 | 5.86888532749796e-06 | 0.0362188476510991 |
OLFM4|10562 | 0.3512 | 3.48935851168797e-06 | 0.0323009917426955 |
C11orf85|283129 | 0.3425 | 0.00688754967740124 | 0.579950713666998 |
CYP4F8|11283 | 0.3394 | 0.0148429297342312 | 0.645632246523302 |
CGA|1081 | 0.3345 | 0.00242376193145444 | 0.476423030323529 |
AMAC1|146861 | 0.3274 | 0.00827759667052819 | 0.593874537637112 |
C20orf141|128653 | 0.3272 | 0.00345293134324942 | 0.532729757407665 |
INHA|3623 | 0.3265 | 9.76297271670923e-06 | 0.0451879192192887 |
TRIM17|51127 | 0.316 | 1.83494550967112e-05 | 0.0632625284360286 |
C8G|733 | 0.313 | 4.02855914163336e-05 | 0.09323092993525 |
LCE5A|254910 | 0.3046 | 0.00979684226331745 | 0.596182849930428 |
OR1F1|4992 | 0.3028 | 0.00285761779867921 | 0.497923409941477 |
THRSP|7069 | 0.2994 | 0.00774556144293648 | 0.585994538325235 |
GPR78|27201 | 0.2975 | 0.0233062458127198 | 0.683193044987412 |
TMED1|11018 | 0.2969 | 5.99622199286554e-05 | 0.123348948862125 |
Gene level (TCGA Level III) mRNAseq expression data and APOBEC_MutLoad_MinEstimate derived by Mutation_APOBEC pipeline were used to do this analysis. Pearson correlation coefficients were calculated for APOBEC_MutLoad_MinEstimate and each gene across all the samples that were common.
Pearson correlation with pairwise.complete.obs was used to do this analysis.