Correlations between copy number and mRNAseq expression

Liver Hepatocellular Carcinoma (Primary solid tumor)

21 April 2013 | analyses__2013_04_21

Maintainer Information

Citation Information

Maintained by TCGA GDAC Team (Broad Institute/MD Anderson Cancer Center/Harvard Medical School)

Cite as Broad Institute TCGA Genome Data Analysis Center (2013): Liver Hepatocellular Carcinoma (Primary solid tumor cohort) - 21 April 2013: Correlations between copy number and mRNAseq expression. Broad Institute of MIT and Harvard. doi:10.7908/C1CF9N24

Overview

Introduction

A TCGA sample is profiled to detect the copy number variations and expressions of genes. This pipeline attempts to correlate copy number and Rnaseq data of genes across samples to determine if the copy number variations also result in differential expressions. This report contains the calculated correlation coefficients based on measurements of genomic copy number (log2) values and RNAseq expression of the corresponding feature across patients. High positive/low negative correlation coefficients indicate that genomic alterations result in differences in the expressions of mRNAseq the genomic regions transcribe.

Summary

The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are 883, 1995, 2664.1, 3269, 3905, 4548, 5221.9, 5951, 6795, respectively.

Results

Correlation results

Number of genes and samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 features ordered by the value of correlation coefficients.

Table 1. Counts of mRNAseq and number of samples in copy number and expression data sets and common to both

Category	Copy number	Expression	Common
Sample	97	69	69
Genes	23778	17848	17758

Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 features (defined by the feature column) ranked by correlation coefficients

Locus ID	Gene Symbol	Cytoband	cor
91782	CHMP7	8p21.3	0.91
220064	ORAOV1	11q13.3	0.8884
10987	COPS5	8q13.1	0.8806
5516	PPP2CB	8p12	0.8732
57805	KIAA1967	8p21.3	0.8608
10671	DCTN6	8p12	0.8538
51001	MTERFD1	8q22.1	0.8506
60528	ELAC2	17p12	0.8492
5828	PEX2	8q21.11	0.8469
7257	TSNAX	1q42.2	0.8457
2339	FNTA	8p11.21	0.8454
7323	UBE2D3	4q24	0.8453
80185	TTI2	8p12	0.8447
92140	MTDH	8q22.1	0.8446
57226	LYRM2	6q15	0.8393
8881	CDC16	13q34	0.8383
3551	IKBKB	8p11.21	0.8382
375	ARF1	1q42.13	0.838
84933	C8orf76	8q24.13	0.8355
116150	NUS1	6q22.1	0.8354

Methods & Data

Input

Gene level (TCGA Level III) mRNAseq expression data and copy number data of corresponding gene derived by GISTIC pipelinePearson correlation coefficients were calculated for each pair of genes shared by the two data sets across all the samples that were common.

Correlation across sample

Pairwise correlations between the log2 copy numbers and expressions of each gene across samples were calculated using Pearson correlation.

Download Results

This is an experimental feature. The full results of the analysis summarized in this report can be downloaded from the TCGA Data Coordination Center.

Made with Nozzle