Stomach Adenocarcinoma: Correlations between copy number and mRNAseq expression

Maintained by TCGA GDAC Team (Broad Institute/Dana-Farber Cancer Institute/Harvard Medical School)

Overview

Introduction

A TCGA sample is profiled to detect the copy number variations and expressions of genes. This pipeline attempts to correlate copy number and Rnaseq data of genes across samples to determine if the copy number variations also result in differential expressions. This report contains the calculated correlation coefficients based on measurements of genomic copy number (log2) values and RNAseq expression of the corresponding feature across patients. High positive/low negative correlation coefficients indicate that genomic alterations result in differences in the expressions of mRNAseq the genomic regions transcribe.

Summary

The correlation coefficients in 10, 20, 30, 40, 50, 60, 70, 80, 90 percentiles are 1060.2, 2147, 2850.6, 3529.8, 4215, 4900, 5561, 6313, 7139.8, respectively.

Results

Correlation results

Number of genes and samples used for the calculation are shown in Table 1. Figure 1 shows the distribution of calculated correlation coefficients and quantile-quantile plot of the calculated correlation coefficients against a normal distribution. Table 2 shows the top 20 features ordered by the value of correlation coefficients.

Table 1. Counts of mRNAseq and number of samples in copy number and expression data sets and common to both

Category	Copy number	Expression	Common
Sample	132	57	57
Genes	22749	19382	18553

Figure 1. Summary figures. Left: histogram showing the distribution of the calculated correlations across samples for all Genes. Right: QQ plot of the calculated correlations across samples. The QQ plot is used to plot the quantiles of the calculated correlation coefficients against that derived from a normal distribution. Points deviating from the blue line indicate deviation from normality.

Table 2. Get Full Table Top 20 features (defined by the feature column) ranked by correlation coefficients

Locus ID	Gene Symbol	Cytoband	cor
84060	RBM48	7q21.2	0.9432
84299	MIEN1	17q12	0.9348
889	KRIT1	7q21.2	0.9262
55717	WDR11	10q26.12	0.9233
54994	C20orf11	20q13.33	0.9186
4848	CNOT2	12q15	0.9143
9862	MED24	17q21.1	0.9139
6873	TAF2	8q24.12	0.9118
830	CAPZA2	7q31.2	0.9057
54467	ANKIB1	7q21.2	0.9008
10210	TOPORS	9p21.1	0.8981
5786	PTPRA	20p13	0.8947
55610	CCDC132	7q21.3	0.8888
79648	MCPH1	8p23.1	0.8858
10564	ARFGEF2	20q13.13	0.8833
4799	NFX1	9p13.3	0.8828
9777	TM9SF4	20q11.21	0.8821
5709	PSMD3	17q21.1	0.8812
54904	WHSC1L1	8p11.23	0.8794
2064	ERBB2	17q12	0.8786

Methods & Data

Input

Gene level (TCGA Level III) mRNAseq expression data and copy number data of corresponding gene derived by GISTIC pipelinePearson correlation coefficients were calculated for each pair of genes shared by the two data sets across all the samples that were common.

Correlation across sample

Pairwise correlations between the log2 copy numbers and expressions of each gene across samples were calculated using Pearson correlation.

Download Results

This is an experimental feature. Location of data archives could not be determined.