Significant over-representation of pathway genesets for a given gene list
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (Primary solid tumor)
21 August 2015  |  analyses__2015_08_21
Maintainer Information
Citation Information
Maintained by Juok Cho (Broad Institute)
Cite as Broad Institute TCGA Genome Data Analysis Center (2015): Significant over-representation of pathway genesets for a given gene list. Broad Institute of MIT and Harvard. doi:10.7908/C1P84B2K

This pipeline inspects significant overlapping pathway genesets for a given gene list using a hypergeometric test. For the gene set database, we uses GSEA MSigDB Class2: Canonical Pathways DB as a geneset data. Further details about the MsigDB genesets, please visit The Broad Institute GSEA MsigDB


For a given gene list, a hypergeometric test was tried to find significant overlapping canonical pathway gene sets. In terms of FDR adjusted p.values, no significant overlapping gene sets are found.

For a given gene list, there is no significant overlapping canonical pathway gene sets
Methods & Data
  • Gene set database = c2.cp.v3.0-2.symbols.gmt

  • Input gene list = sig_genes.txt

Hypergeometric Test

For a given gene list, it uses a hypergeometric test to get a significance of each overlapping pathway geneset. The hypergeometric p-value is obtained by R library function phyer() and is defined as a probability of randomly drawing x or more successes(gene matches) from the population consisting N genes in k(the input genes) total draws.

  • a cumulative p.val with lower tail==T in phyer():

    • ex). a probability to see at least 3 genes in the group is p(x>=3) = 1 - p(x<=2)= 1 - phyer(2, lower.tail=T) that is, f(x| N, m, k) = mCk * ((N-m) C (n-k)) / ((N) C (n))

  • The hypergeometric test is identical to the corresponding one-tailed version of Fisher's exact test.

    • ex). Fisher' exact test = matrix(c(n.Found, n.GS-n.Found, n.drawn-n.Found, n.NotGS- (n.drawn-n.Found)), nrow=2, dimnames = list(inputGenes = c("Found", "NotFound"),GeneUniverse = c("GS", "nonGS")) )

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

[1] Johnson, N.L., et al, Univariate Discrete Distributions, Second Edition, Wiley (1992)
[2] Berkopec, Aleš, HyperQuick algorithm for discrete hypergeometric distribution, Journal of Discrete Algorithms:341-347 (2007)
[3] Tamayo, et al, Molecular Signatures Database, MSigDB, PNAS:15545-15550 (2005)