This pipeline infers putative direct gene targets of miRs based on miRseq and mRNAseq expression profiles across multiple samples. This pipeline has the following steps:
-
The CLR approach[8] is applied to infer putative miR:gene regulatory connections.
-
Filtering miR:gene pairs based on Pearson correlation (<= -0.3 ).
-
Filtering miR:gene pairs based on predicted interactions in three sequence prediction databases (Miranda, Pictar, Targetscan)
The CLR algorithm was applied on 850 miRs and 18301 mRNAs across 120 samples. After 2 filtering steps, the number of 52 miR:genes pairs were detected.
Mir | Gene | Corr | p | q | Prediction.DBs | Miranda | Pictar | Targetscan | Total |
---|---|---|---|---|---|---|---|---|---|
hsa-miR-101-3p | MORN4 | -0.58 | 9.4e-07 | 1.2e-06 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-107 | LATS2 | -0.61 | 7.8e-07 | 1.1e-06 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-107 | RSPO3 | -0.52 | 3.4e-06 | 3.6e-06 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-137 | RCAN2 | -0.4 | 1.4e-06 | 1.8e-06 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-141-3p | WIPF1 | -0.61 | 5.6e-09 | 1.6e-08 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-141-3p | STX2 | -0.58 | 2.4e-11 | 1.2e-10 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-141-3p | PCDH9 | -0.58 | 1e-12 | 6.8e-12 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-141-3p | SEPT7 | -0.52 | 2.2e-10 | 9e-10 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-142-3p | DIRC2 | -0.82 | 4.1e-08 | 8.8e-08 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
hsa-miR-142-3p | BOD1 | -0.71 | 2e-06 | 2.3e-06 | miranda,pictar,targetscan | 1 | 1 | 1 | 3 |
Mir | Number.of.Genes | Genes |
---|---|---|
hsa-miR-16-5p | 7 | CAMSAP1, CLDN12, COPS7A, ESRP1, MIPOL1, PTPN3,ANKRD57 |
hsa-miR-29b-3p | 6 | GAS7, GPX7, PHC1, SCHIP1, TET1,CCDC117 |
hsa-miR-200c-3p | 6 | MBOAT2, PCSK2, SOX1, WIPF1, ZEB1,FHOD1 |
hsa-miR-18a-5p | 5 | DIP2C, PDZD2, RABGAP1, ZBTB4,CDC14B |
hsa-miR-141-3p | 4 | SEPT7, STX2, WIPF1,PCDH9 |
hsa-miR-142-3p | 3 | COPS7A, DIRC2,BOD1 |
hsa-miR-25-3p | 2 | NPTN,NFIA |
hsa-miR-155-5p | 2 | MYO1D,MYO10 |
hsa-miR-31-5p | 2 | VAV3,CTNND2 |
hsa-miR-200b-3p | 2 | ZEB1,GIT2 |
Gene | Number.of.Mirs | Mirs |
---|---|---|
ZEB1 | 2 | hsa-miR-200c-3p,hsa-miR-200b-3p |
ZBTB4 | 2 | hsa-miR-18b-5p,hsa-miR-18a-5p |
TET1 | 2 | hsa-miR-29b-3p,hsa-miR-29a-3p |
COPS7A | 2 | hsa-miR-142-3p,hsa-miR-16-5p |
WIPF1 | 2 | hsa-miR-200c-3p,hsa-miR-141-3p |
PCSK2 | 1 | hsa-miR-200c-3p |
KIT | 1 | hsa-miR-193b-3p |
CLDN12 | 1 | hsa-miR-16-5p |
ESRP1 | 1 | hsa-miR-16-5p |
GIT2 | 1 | hsa-miR-200b-3p |
This section should list the files that were used as input.
-
miRseq (at precursor level) of RPM value (reads per million reads aligned to miRBase precursor) with log2 transformed = /xchip/cga/gdac-prod/tcga-gdac/jobResults/miRseq_mature_preprocess/THYM-TP/22312609/THYM-TP.miRseq_mature_RPM_log2.txt
-
mRNAseq of RSEM/RPKM value with log2 transformed = /xchip/cga/gdac-prod/tcga-gdac/jobResults/mRNAseq_preprocessor/THYM-TP/22312606/THYM-TP.uncv2.mRNAseq_RSEM_normalized_log2.txt
-
miR:gene predicted interactions file = /xchip/cga_home/hailei/FH/CLR/human_interactions.predicted.v2.mirbase21.txt
-
Miranda = microrna.org Aug 2010 release, Microcosm version 5
-
Pictar = version 1
-
Targetscan = release 5.2
The CLR (Context Likelihood of Relatedness) algorithm builds upon the relevance network strategies, by applying a background correction step. After computing the mutual information between regulators and their potential target genes, CLR calculates the statistical likelihood of each mutual information value within its network context. The algorithm compares the mutual information between a miR/gene pair to the background distribution of mutual information scores for all possible miR/gene pairs that include either the miR or its target. After this background correction, the most probable interactions are those whose mutual information scores stand significantly above the background distribution of mutual information scores[8] .
Pairwise Pearson correlations coefficients between all miR:gene pairs are first computed. All genes that have correlation values less than the user-defined threshold (-0.3) with a particular miR and have been predicted as targets of that miR in three sequence based prediction databases: Miranda[1][2] Pictar[3][4], TargetScan [5][6][7] are identified as putative direct targets of that miR. We infer a direct target miR:gene network which comprises all such putative direct associations.
-
threshold = -0.3
In addition to the links below, the full results of the analysis summarized in this report can also be downloaded programmatically using firehose_get, or interactively from either the Broad GDAC website or TCGA Data Coordination Center Portal.