This pipeline inspects significant overlapping pathway gene sets for a given gene list using a hypergeometric test. For the gene set database, we uses GSEA MSigDB Class2: Canonical Pathways DB as a gene set data. Further details about the MsigDB gene sets, please visit The Broad Institute GSEA MsigDB
For a given gene list, a hypergeometric test was tried to find significant overlapping canonical pathways using 1320 gene sets. In terms of FDR adjusted p.values, top 5 significant overlapping gene sets are listed as below.
-
PID_ENDOTHELINPATHWAY, PID_TXA2PATHWAY, REACTOME_REGULATION_OF_INSULIN_SECRETION_BY_ACETYLCHOLINE, PID_THROMBIN_PAR4_PATHWAY, KEGG_CALCIUM_SIGNALING_PATHWAY
GS(gene set) pathway name | gene.list | GS size (m) | n.NotInGS (n) | Gene universe (N) | n.drawn (k) | n.found (x) | p.value (p(X>=x)) | FDR (q.value) |
---|---|---|---|---|---|---|---|---|
PID ENDOTHELINPATHWAY | gene.list | 63 | 45893 | 45956 | 10 | 3 | 2.926e-07 | 0.0001931 |
PID TXA2PATHWAY | gene.list | 57 | 45899 | 45956 | 10 | 3 | 2.157e-07 | 0.0001931 |
REACTOME REGULATION OF INSULIN SECRETION BY ACETYLCHOLINE | gene.list | 11 | 45945 | 45956 | 10 | 2 | 2.341e-06 | 0.0010300 |
PID THROMBIN PAR4 PATHWAY | gene.list | 15 | 45941 | 45956 | 10 | 2 | 4.468e-06 | 0.0014740 |
KEGG CALCIUM SIGNALING PATHWAY | gene.list | 178 | 45778 | 45956 | 10 | 3 | 6.720e-06 | 0.0015300 |
PID S1P META PATHWAY | gene.list | 21 | 45935 | 45956 | 10 | 2 | 8.930e-06 | 0.0015300 |
PID S1P S1P2 PATHWAY | gene.list | 24 | 45932 | 45956 | 10 | 2 | 1.173e-05 | 0.0015300 |
REACTOME GASTRIN CREB SIGNALLING PATHWAY VIA PKC AND MAPK | gene.list | 205 | 45751 | 45956 | 10 | 3 | 1.026e-05 | 0.0015300 |
REACTOME G ALPHA Q SIGNALLING EVENTS | gene.list | 184 | 45772 | 45956 | 10 | 3 | 7.422e-06 | 0.0015300 |
REACTOME ADP SIGNALLING THROUGH P2RY1 | gene.list | 25 | 45931 | 45956 | 10 | 2 | 1.275e-05 | 0.0015300 |
REACTOME THROMBOXANE SIGNALLING THROUGH TP RECEPTOR | gene.list | 23 | 45933 | 45956 | 10 | 2 | 1.076e-05 | 0.0015300 |
PID S1P S1P3 PATHWAY | gene.list | 29 | 45927 | 45956 | 10 | 2 | 1.725e-05 | 0.0018970 |
REACTOME SIGNAL AMPLIFICATION | gene.list | 31 | 45925 | 45956 | 10 | 2 | 1.975e-05 | 0.0019860 |
REACTOME THROMBIN SIGNALLING THROUGH PROTEINASE ACTIVATED RECEPTORS PARS | gene.list | 32 | 45924 | 45956 | 10 | 2 | 2.106e-05 | 0.0019860 |
ST ADRENERGIC | gene.list | 36 | 45920 | 45956 | 10 | 2 | 2.674e-05 | 0.0022060 |
PID ARF6 PATHWAY | gene.list | 35 | 45921 | 45956 | 10 | 2 | 2.526e-05 | 0.0022060 |
PID ER NONGENOMIC PATHWAY | gene.list | 41 | 45915 | 45956 | 10 | 2 | 3.479e-05 | 0.0027010 |
PID THROMBIN PAR1 PATHWAY | gene.list | 43 | 45913 | 45956 | 10 | 2 | 3.830e-05 | 0.0028090 |
PID LYSOPHOSPHOLIPID PATHWAY | gene.list | 66 | 45890 | 45956 | 10 | 2 | 9.073e-05 | 0.0063040 |
KEGG LONG TERM DEPRESSION | gene.list | 70 | 45886 | 45956 | 10 | 2 | 1.021e-04 | 0.0067390 |
-
Gene set database = c2.cp.v4.0.symbols.gmt
-
Input gene list = MutSig2CV.input.genenames.txt
For a given gene list, it uses a hypergeometric test to get a significance of each overlapping pathway gene set. The hypergeometric p-value is obtained by R library function phyper() and is defined as a probability of randomly drawing x or more successes(gene matches) from the population consisting N genes in k(the input genes) total draws.
-
a cumulative p-value using the R function phyper():
-
ex). a probability to see at least x genes in the group is defined as p(X>=x) = 1 - p(X<=x)= 1 - phyper(x-1, m, n, k, lower.tail=FALSE, log.p=FALSE) that is, f(x| N, m, k) = (m) C (k) * ((N-m) C (n-k)) / ((N) C (n))
-
The hypergeometric test is identical to the corresponding one-tailed version of Fisher's exact test.
-
ex). Fisher' exact test = matrix(c(n.Found, n.GS-n.Found, n.drawn-n.Found, n.NotGS- (n.drawn-n.Found)), nrow=2, dimnames = list(inputGenes = c("Found", "NotFound"),GeneUniverse = c("GS", "nonGS")) )
In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.