This pipeline inspects significant overlapping pathway gene sets for a given gene list using a hypergeometric test. For the gene set database, we uses GSEA MSigDB Class2: Canonical Pathways DB as a gene set data. Further details about the MsigDB gene sets, please visit The Broad Institute GSEA MsigDB
For a given gene list, a hypergeometric test was tried to find significant overlapping canonical pathways using 1320 gene sets. In terms of FDR adjusted p.values, top 5 significant overlapping gene sets are listed as below.
-
KEGG_PATHWAYS_IN_CANCER, KEGG_BLADDER_CANCER, KEGG_PROSTATE_CANCER, KEGG_MELANOMA, KEGG_CHRONIC_MYELOID_LEUKEMIA
GS(gene set) pathway name | gene.list | GS size (m) | n.NotInGS (n) | Gene universe (N) | n.drawn (k) | n.found (x) | p.value (p(X>=x)) | FDR (q.value) |
---|---|---|---|---|---|---|---|---|
KEGG PATHWAYS IN CANCER | gene.list | 328 | 45628 | 45956 | 101 | 19 | 9.371e-22 | 1.237e-18 |
KEGG BLADDER CANCER | gene.list | 42 | 45914 | 45956 | 101 | 10 | 2.307e-18 | 1.523e-15 |
KEGG PROSTATE CANCER | gene.list | 89 | 45867 | 45956 | 101 | 11 | 1.044e-16 | 4.594e-14 |
KEGG MELANOMA | gene.list | 71 | 45885 | 45956 | 101 | 10 | 6.871e-16 | 2.267e-13 |
KEGG CHRONIC MYELOID LEUKEMIA | gene.list | 73 | 45883 | 45956 | 101 | 10 | 9.212e-16 | 2.432e-13 |
KEGG NON SMALL CELL LUNG CANCER | gene.list | 54 | 45902 | 45956 | 101 | 9 | 4.067e-15 | 8.948e-13 |
KEGG GLIOMA | gene.list | 65 | 45891 | 45956 | 101 | 9 | 2.397e-14 | 4.520e-12 |
KEGG ENDOMETRIAL CANCER | gene.list | 52 | 45904 | 45956 | 101 | 8 | 2.849e-13 | 4.179e-11 |
REACTOME DEVELOPMENTAL BIOLOGY | gene.list | 396 | 45560 | 45956 | 101 | 14 | 2.575e-13 | 4.179e-11 |
REACTOME IMMUNE SYSTEM | gene.list | 933 | 45023 | 45956 | 101 | 18 | 2.247e-12 | 2.967e-10 |
KEGG CELL CYCLE | gene.list | 128 | 45828 | 45956 | 101 | 9 | 1.276e-11 | 1.531e-09 |
PID CMYB PATHWAY | gene.list | 84 | 45872 | 45956 | 101 | 8 | 1.558e-11 | 1.714e-09 |
PID P53DOWNSTREAMPATHWAY | gene.list | 137 | 45819 | 45956 | 101 | 9 | 2.359e-11 | 2.395e-09 |
REACTOME SIGNALING BY ERBB4 | gene.list | 90 | 45866 | 45956 | 101 | 8 | 2.741e-11 | 2.584e-09 |
REACTOME HEMOSTASIS | gene.list | 466 | 45490 | 45956 | 101 | 13 | 3.686e-11 | 3.244e-09 |
KEGG THYROID CANCER | gene.list | 29 | 45927 | 45956 | 101 | 6 | 4.419e-11 | 3.646e-09 |
KEGG REGULATION OF ACTIN CYTOSKELETON | gene.list | 216 | 45740 | 45956 | 101 | 10 | 5.649e-11 | 4.386e-09 |
REACTOME SIGNALING BY ERBB2 | gene.list | 101 | 45855 | 45956 | 101 | 8 | 7.006e-11 | 5.137e-09 |
KEGG PANCREATIC CANCER | gene.list | 70 | 45886 | 45956 | 101 | 7 | 2.145e-10 | 1.490e-08 |
PID ER NONGENOMIC PATHWAY | gene.list | 41 | 45915 | 45956 | 101 | 6 | 4.095e-10 | 2.703e-08 |
-
Gene set database = c2.cp.v4.0.symbols.gmt
-
Input gene list = MutSig2CV.input.genenames.txt
For a given gene list, it uses a hypergeometric test to get a significance of each overlapping pathway gene set. The hypergeometric p-value is obtained by R library function phyper() and is defined as a probability of randomly drawing x or more successes(gene matches) from the population consisting N genes in k(the input genes) total draws.
-
a cumulative p-value using the R function phyper():
-
ex). a probability to see at least x genes in the group is defined as p(X>=x) = 1 - p(X<=x)= 1 - phyper(x-1, m, n, k, lower.tail=FALSE, log.p=FALSE) that is, f(x| N, m, k) = (m) C (k) * ((N-m) C (n-k)) / ((N) C (n))
-
The hypergeometric test is identical to the corresponding one-tailed version of Fisher's exact test.
-
ex). Fisher' exact test = matrix(c(n.Found, n.GS-n.Found, n.drawn-n.Found, n.NotGS- (n.drawn-n.Found)), nrow=2, dimnames = list(inputGenes = c("Found", "NotFound"),GeneUniverse = c("GS", "nonGS")) )
In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.