Identification of putative miR direct targets by sequencing data
Lung Adenocarcinoma (Primary solid tumor)
02 April 2015  |  analyses__2015_04_02
Maintainer Information
Citation Information
Maintained by Hailei Zhang (Broad Institute)
Cite as Broad Institute TCGA Genome Data Analysis Center (2015): Identification of putative miR direct targets by sequencing data. Broad Institute of MIT and Harvard. doi:10.7908/C1G44PCH
Overview
Introduction

This pipeline infers putative direct gene targets of miRs based on miRseq and mRNAseq expression profiles across multiple samples. This pipeline has the following steps:

  1. The CLR approach[8] is applied to infer putative miR:gene regulatory connections.

  2. Filtering miR:gene pairs based on Pearson correlation (<= -0.3 ).

  3. Filtering miR:gene pairs based on predicted interactions in three sequence prediction databases (Miranda, Pictar, Targetscan)

Summary

The CLR algorithm was applied on 789 miRs and 18319 mRNAs across 447 samples. After 2 filtering steps, the number of 147 miR:genes pairs were detected.

Results
Significant miR:gene pairs

Figure 1.  Get High-res Image All miR hubs with their strong anti-correlated genes and predicted interactions in three sequence prediction databases.

Table 1.  Get Full Table List of miR:gene pairs with corr < -0.30 and predicted interactions in three sequence prediction databases.

Mir Gene Corr p q Prediction.DBs Miranda Pictar Targetscan Total
hsa-miR-1-3p HSPD1 -0.4 3.5e-07 5.3e-07 miranda,pictar,targetscan 1 1 1 3
hsa-miR-1-3p SFRS9 -0.38 1e-08 2.3e-08 miranda,pictar,targetscan 1 1 1 3
hsa-miR-101-3p RAD54L -0.35 2.5e-06 3e-06 miranda,pictar,targetscan 1 1 1 3
hsa-miR-101-3p EZH2 -0.35 1.7e-06 2.1e-06 miranda,pictar,targetscan 1 1 1 3
hsa-miR-101-3p DNMT3A -0.34 1.2e-06 1.6e-06 miranda,pictar,targetscan 1 1 1 3
hsa-miR-141-3p SEPT7 -0.36 4e-11 1.5e-10 miranda,pictar,targetscan 1 1 1 3
hsa-miR-141-3p FRMD6 -0.36 3.2e-08 6.2e-08 miranda,pictar,targetscan 1 1 1 3
hsa-miR-141-3p PTPRG -0.35 3.9e-10 1.2e-09 miranda,pictar,targetscan 1 1 1 3
hsa-miR-141-3p TNS3 -0.33 1.1e-16 1e-15 miranda,pictar,targetscan 1 1 1 3
hsa-miR-141-3p WIPF1 -0.32 1.9e-06 2.3e-06 miranda,pictar,targetscan 1 1 1 3
miR connections

Table 2.  Get Full Table All miR hubs with their associated genes in the putative direct target network.

Mir Number.of.Genes Genes
hsa-miR-30b-5p 23 B3GNT5, CALU, CAPZA1, CUL2, DNAJC13, E2F7, EAF1, ERLIN1, FNDC3B, FXR1, KIAA1033, KPNA3, NUS1, PAWR, PPP1R12A, PSME3, R3HDM1, RAD23B, SEC23A, SSX2IP, STC1, TAOK1,ATP2B1
hsa-miR-30d-5p 19 B3GNT5, CALU, CARS, DRP2, E2F7, EED, ERLIN1, FLJ36031, GFPT2, ITGA5, MYBL2, PDSS1, R3HDM1, SEC23A, SLC38A7, SNAI1, SSX2IP, STC1,ATP2B1
hsa-miR-30c-5p 17 CUL2, EAF1, ERLIN1, KIAA1033, KRAS, MTDH, PAPOLA, PAWR, PPP1R12A, PSME3, R3HDM1, RAD23B, ROD1, SEC23A, TAOK1, YWHAZ,CALU
hsa-miR-200b-3p 10 ASAP1, CALU, CFL2, CORO1C, MMD, NRBP1, PTPN12, SEC23A, YWHAG,ANLN
hsa-miR-141-3p 9 FRMD6, NRP1, PTPRG, SEPT7, TNS3, TSHZ3, WIPF1, ZEB2,DNAJC13
hsa-miR-200c-3p 8 GIT2, MMD, NCS1, NIN, PALM2-AKAP2, TLN1, WIPF1,CORO1C
hsa-miR-96-5p 8 GIT2, JMJD1C, MITF, PCGF5, SEMA6A, TACC1, VCL,FOXO1
hsa-miR-29a-3p 7 DNMT3A, DNMT3B, E2F7, MYBL2, TDG, USP37,BLMH
hsa-miR-18a-5p 6 CTDSPL, KLHL20, NR3C1, RAB11FIP2, TEX2,CREBL2
hsa-miR-26a-5p 6 EZH2, GTF3C2, KPNA2, SETD8, ULK1,CHAF1A
Gene connections

Table 3.  Get Full Table All gene hubs with their associated miRs in the putative direct target network.

Gene Number.of.Mirs Mirs
SEC23A 4 hsa-miR-200b-3p, hsa-miR-30b-5p, hsa-miR-30d-5p,hsa-miR-30c-5p
CALU 4 hsa-miR-200b-3p, hsa-miR-30b-5p, hsa-miR-30d-5p,hsa-miR-30c-5p
E2F7 3 hsa-miR-30b-5p, hsa-miR-30d-5p,hsa-miR-29a-3p
GIT2 3 hsa-miR-182-5p, hsa-miR-200c-3p,hsa-miR-96-5p
R3HDM1 3 hsa-miR-30b-5p, hsa-miR-30d-5p,hsa-miR-30c-5p
MYBL2 3 hsa-miR-29c-3p, hsa-miR-30d-5p,hsa-miR-29a-3p
ERLIN1 3 hsa-miR-30b-5p, hsa-miR-30d-5p,hsa-miR-30c-5p
DNMT3A 3 hsa-miR-101-3p, hsa-miR-29c-3p,hsa-miR-29a-3p
B3GNT5 2 hsa-miR-30b-5p,hsa-miR-30d-5p
KIAA1033 2 hsa-miR-30b-5p,hsa-miR-30c-5p
Methods & Data
Input

This section should list the files that were used as input.

  • miRseq (at precursor level) of RPM value (reads per million reads aligned to miRBase precursor) with log2 transformed = /xchip/cga/gdac-prod/tcga-gdac/jobResults/miRseq_mature_preprocess/LUAD-TP/14527078/LUAD-TP.miRseq_mature_RPM_log2.txt

  • mRNAseq of RSEM/RPKM value with log2 transformed = /xchip/cga/gdac-prod/tcga-gdac/jobResults/mRNAseq_preprocessor/LUAD-TP/14527079/LUAD-TP.uncv2.mRNAseq_RSEM_normalized_log2.txt

  • miR:gene predicted interactions file = /xchip/cga_home/hailei/FH/CLR/human_interactions.predicted.v2.mirbase21.txt

  • Miranda = microrna.org Aug 2010 release, Microcosm version 5

  • Pictar = version 1

  • Targetscan = release 5.2

CLR method

The CLR (Context Likelihood of Relatedness) algorithm builds upon the relevance network strategies, by applying a background correction step. After computing the mutual information between regulators and their potential target genes, CLR calculates the statistical likelihood of each mutual information value within its network context. The algorithm compares the mutual information between a miR/gene pair to the background distribution of mutual information scores for all possible miR/gene pairs that include either the miR or its target. After this background correction, the most probable interactions are those whose mutual information scores stand significantly above the background distribution of mutual information scores[8] .

Pearson corelation

Pairwise Pearson correlations coefficients between all miR:gene pairs are first computed. All genes that have correlation values less than the user-defined threshold (-0.3) with a particular miR and have been predicted as targets of that miR in three sequence based prediction databases: Miranda[1][2] Pictar[3][4], TargetScan [5][6][7] are identified as putative direct targets of that miR. We infer a direct target miR:gene network which comprises all such putative direct associations.

  • threshold = -0.3

Download Results

In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.

References
[1] Betel D, Wilson M, Gabow A, Marks DS, Sander C, microRNA target predictions: The microRNA.org resource: targets and expression, Nucleic Acids Res 36:D149-53 (2008)
[2] John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS, miRanda: Human MicroRNA targets., PLoS Biol 3(7):e264 (2005)
[3] Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, Piedade ID, Gunsalus KC, Stoffel M, Rajewsky N, Combinatorial microRNA target predictions, Nature Genetics 37:495-500 (2005)
[4] Chen K, Rajewsky N, Natural selection on human microRNA binding sites inferred from SNP data., Nat Genet 38:1452-1456 (2006)
[5] Lewis BP, Burge CB, Bartel DP, Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets., Cell 120(120):15-20 (2005)
[6] Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP, MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing., Molecular Cell 27:91-105 (2007)
[7] Friedman RC, Farh KK, Burge CB, Bartel DP, Most Mammalian mRNAs Are Conserved Targets of MicroRNAs., Genome Research 19:92-105 (2009)
[8] Genovese G1, Ergun A, Shukla SA, Campos B, Hanna J, Ghosh P, Quayle SN, Rai K, Colla S, Ying H, Wu CJ, Sarkar S, Xiao Y, Zhang J, Zhang H, Kwong L, Dunn K, Wiedemeyer WR, Brennan C, Zheng H, Rimm DL, Collins JJ, Chin L., microRNA regulatory network inference identifies miR-34a as a novel regulator of TGF-β signaling in glioblastoma., Cancer Discovery 2(8):736-49 (2012)