This pipeline inspects significant overlapping pathway gene sets for a given gene list using a hypergeometric test. For the gene set database, we uses GSEA MSigDB Class2: Canonical Pathways DB as a gene set data. Further details about the MsigDB gene sets, please visit The Broad Institute GSEA MsigDB
For a given gene list, a hypergeometric test was tried to find significant overlapping canonical pathways using 1320 gene sets. In terms of FDR adjusted p.values, top 5 significant overlapping gene sets are listed as below.
-
KEGG_RENAL_CELL_CARCINOMA, PID_ERBB2ERBB3PATHWAY, PID_PI3KCIPATHWAY, KEGG_ACUTE_MYELOID_LEUKEMIA, KEGG_VEGF_SIGNALING_PATHWAY
GS(gene set) pathway name | gene.list | GS size (m) | n.NotInGS (n) | Gene universe (N) | n.drawn (k) | n.found (x) | p.value (p(X>=x)) | FDR (q.value) |
---|---|---|---|---|---|---|---|---|
KEGG RENAL CELL CARCINOMA | gene.list | 70 | 45886 | 45956 | 64 | 5 | 5.040e-08 | 6.653e-05 |
PID ERBB2ERBB3PATHWAY | gene.list | 44 | 45912 | 45956 | 64 | 4 | 4.452e-07 | 2.938e-04 |
PID PI3KCIPATHWAY | gene.list | 49 | 45907 | 45956 | 64 | 4 | 6.912e-07 | 3.041e-04 |
KEGG ACUTE MYELOID LEUKEMIA | gene.list | 60 | 45896 | 45956 | 64 | 4 | 1.573e-06 | 5.190e-04 |
KEGG VEGF SIGNALING PATHWAY | gene.list | 76 | 45880 | 45956 | 64 | 4 | 4.069e-06 | 7.838e-04 |
KEGG B CELL RECEPTOR SIGNALING PATHWAY | gene.list | 75 | 45881 | 45956 | 64 | 4 | 3.859e-06 | 7.838e-04 |
KEGG FC EPSILON RI SIGNALING PATHWAY | gene.list | 79 | 45877 | 45956 | 64 | 4 | 4.750e-06 | 7.838e-04 |
REACTOME SIGNALING BY SCF KIT | gene.list | 78 | 45878 | 45956 | 64 | 4 | 4.515e-06 | 7.838e-04 |
KEGG PATHWAYS IN CANCER | gene.list | 328 | 45628 | 45956 | 64 | 6 | 6.683e-06 | 9.802e-04 |
PID ERBB1 DOWNSTREAM PATHWAY | gene.list | 105 | 45851 | 45956 | 64 | 4 | 1.471e-05 | 1.942e-03 |
PID ERBB1 RECEPTOR PROXIMAL PATHWAY | gene.list | 35 | 45921 | 45956 | 64 | 3 | 1.633e-05 | 1.960e-03 |
KEGG NEUROTROPHIN SIGNALING PATHWAY | gene.list | 126 | 45830 | 45956 | 64 | 4 | 3.013e-05 | 2.295e-03 |
PID EPHBFWDPATHWAY | gene.list | 40 | 45916 | 45956 | 64 | 3 | 2.453e-05 | 2.295e-03 |
PID PDGFRBPATHWAY | gene.list | 129 | 45827 | 45956 | 64 | 4 | 3.304e-05 | 2.295e-03 |
PID ERBB1 INTERNALIZATION PATHWAY | gene.list | 41 | 45915 | 45956 | 64 | 3 | 2.644e-05 | 2.295e-03 |
PID CXCR3PATHWAY | gene.list | 43 | 45913 | 45956 | 64 | 3 | 3.055e-05 | 2.295e-03 |
REACTOME SIGNALING BY THE B CELL RECEPTOR BCR | gene.list | 126 | 45830 | 45956 | 64 | 4 | 3.013e-05 | 2.295e-03 |
REACTOME SIGNALING BY FGFR MUTANTS | gene.list | 44 | 45912 | 45956 | 64 | 3 | 3.275e-05 | 2.295e-03 |
REACTOME IL 2 SIGNALING | gene.list | 41 | 45915 | 45956 | 64 | 3 | 2.644e-05 | 2.295e-03 |
KEGG NATURAL KILLER CELL MEDIATED CYTOTOXICITY | gene.list | 137 | 45819 | 45956 | 64 | 4 | 4.180e-05 | 2.759e-03 |
-
Gene set database = c2.cp.v4.0.symbols.gmt
-
Input gene list = MutSig2CV.input.genenames.txt
For a given gene list, it uses a hypergeometric test to get a significance of each overlapping pathway gene set. The hypergeometric p-value is obtained by R library function phyper() and is defined as a probability of randomly drawing x or more successes(gene matches) from the population consisting N genes in k(the input genes) total draws.
-
a cumulative p-value using the R function phyper():
-
ex). a probability to see at least x genes in the group is defined as p(X>=x) = 1 - p(X<=x)= 1 - phyper(x-1, m, n, k, lower.tail=FALSE, log.p=FALSE) that is, f(x| N, m, k) = (m) C (k) * ((N-m) C (n-k)) / ((N) C (n))
-
The hypergeometric test is identical to the corresponding one-tailed version of Fisher's exact test.
-
ex). Fisher' exact test = matrix(c(n.Found, n.GS-n.Found, n.drawn-n.Found, n.NotGS- (n.drawn-n.Found)), nrow=2, dimnames = list(inputGenes = c("Found", "NotFound"),GeneUniverse = c("GS", "nonGS")) )
In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.