This pipeline inspects significant overlapping pathway gene sets for a given gene list using a hypergeometric test. For the gene set database, we uses GSEA MSigDB Class2: Canonical Pathways DB as a gene set data. Further details about the MsigDB gene sets, please visit The Broad Institute GSEA MsigDB
For a given gene list, a hypergeometric test was tried to find significant overlapping canonical pathways using 1320 gene sets. In terms of FDR adjusted p.values, top 5 significant overlapping gene sets are listed as below.
-
KEGG_ENDOMETRIAL_CANCER, KEGG_PROSTATE_CANCER, KEGG_SMALL_CELL_LUNG_CANCER, KEGG_PATHWAYS_IN_CANCER, BIOCARTA_PTEN_PATHWAY
GS(gene set) pathway name | gene.list | GS size (m) | n.NotInGS (n) | Gene universe (N) | n.drawn (k) | n.found (x) | p.value (p(X>=x)) | FDR (q.value) |
---|---|---|---|---|---|---|---|---|
KEGG ENDOMETRIAL CANCER | gene.list | 52 | 45904 | 45956 | 57 | 6 | 5.407e-11 | 7.137e-08 |
KEGG PROSTATE CANCER | gene.list | 89 | 45867 | 45956 | 57 | 6 | 1.490e-09 | 6.555e-07 |
KEGG SMALL CELL LUNG CANCER | gene.list | 84 | 45872 | 45956 | 57 | 6 | 1.047e-09 | 6.555e-07 |
KEGG PATHWAYS IN CANCER | gene.list | 328 | 45628 | 45956 | 57 | 8 | 7.539e-09 | 1.990e-06 |
BIOCARTA PTEN PATHWAY | gene.list | 18 | 45938 | 45956 | 57 | 4 | 6.421e-09 | 1.990e-06 |
PID ECADHERIN KERATINOCYTE PATHWAY | gene.list | 21 | 45935 | 45956 | 57 | 4 | 1.252e-08 | 2.755e-06 |
KEGG COLORECTAL CANCER | gene.list | 62 | 45894 | 45956 | 57 | 5 | 1.503e-08 | 2.835e-06 |
BIOCARTA CTCF PATHWAY | gene.list | 23 | 45933 | 45956 | 57 | 4 | 1.850e-08 | 3.052e-06 |
BIOCARTA GSK3 PATHWAY | gene.list | 27 | 45929 | 45956 | 57 | 4 | 3.652e-08 | 5.357e-06 |
REACTOME PI3K EVENTS IN ERBB4 SIGNALING | gene.list | 38 | 45918 | 45956 | 57 | 4 | 1.521e-07 | 1.673e-05 |
REACTOME PI3K AKT ACTIVATION | gene.list | 38 | 45918 | 45956 | 57 | 4 | 1.521e-07 | 1.673e-05 |
REACTOME GAB1 SIGNALOSOME | gene.list | 38 | 45918 | 45956 | 57 | 4 | 1.521e-07 | 1.673e-05 |
REACTOME PI3K EVENTS IN ERBB2 SIGNALING | gene.list | 44 | 45912 | 45956 | 57 | 4 | 2.781e-07 | 2.824e-05 |
REACTOME PI 3K CASCADE | gene.list | 56 | 45900 | 45956 | 57 | 4 | 7.442e-07 | 7.017e-05 |
SA PTEN PATHWAY | gene.list | 17 | 45939 | 45956 | 57 | 3 | 1.215e-06 | 1.069e-04 |
KEGG GLIOMA | gene.list | 65 | 45891 | 45956 | 57 | 4 | 1.360e-06 | 1.122e-04 |
SIG PIP3 SIGNALING IN CARDIAC MYOCTES | gene.list | 67 | 45889 | 45956 | 57 | 4 | 1.537e-06 | 1.194e-04 |
KEGG MELANOMA | gene.list | 71 | 45885 | 45956 | 57 | 4 | 1.942e-06 | 1.341e-04 |
BIOCARTA IGF1MTOR PATHWAY | gene.list | 20 | 45936 | 45956 | 57 | 3 | 2.032e-06 | 1.341e-04 |
PID AP1 PATHWAY | gene.list | 70 | 45886 | 45956 | 57 | 4 | 1.834e-06 | 1.341e-04 |
-
Gene set database = c2.cp.v4.0.symbols.gmt
-
Input gene list = MutSig2CV.input.genenames.txt
For a given gene list, it uses a hypergeometric test to get a significance of each overlapping pathway gene set. The hypergeometric p-value is obtained by R library function phyper() and is defined as a probability of randomly drawing x or more successes(gene matches) from the population consisting N genes in k(the input genes) total draws.
-
a cumulative p-value using the R function phyper():
-
ex). a probability to see at least x genes in the group is defined as p(X>=x) = 1 - p(X<=x)= 1 - phyper(x-1, m, n, k, lower.tail=FALSE, log.p=FALSE) that is, f(x| N, m, k) = (m) C (k) * ((N-m) C (n-k)) / ((N) C (n))
-
The hypergeometric test is identical to the corresponding one-tailed version of Fisher's exact test.
-
ex). Fisher' exact test = matrix(c(n.Found, n.GS-n.Found, n.drawn-n.Found, n.NotGS- (n.drawn-n.Found)), nrow=2, dimnames = list(inputGenes = c("Found", "NotFound"),GeneUniverse = c("GS", "nonGS")) )
In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.