Preprocessing of clinical data

Lung Adenocarcinoma (MOLECULAR_NONSMOKER)

07 February 2013 | awg_luad__2013_02_07

Maintainer Information

Citation Information

Maintained by TCGA GDAC Team (Broad Institute/MD Anderson Cancer Center/Harvard Medical School)

Cite as Broad Institute TCGA Genome Data Analysis Center (2013): Preprocessing of clinical data. Broad Institute of MIT and Harvard. doi:10.7908/C1028PNK

Overview

Introduction

The clinical information for each TCGA tumor sample is stored in a xml file. Patient ID, tumor and treatment info are entries of the xml file. These xml files have been preprocessed for further analysis.

Summary

Clinical data for tier 1 clinical variables are generated.

Table 1. Tier1 clinical variables

Tumor.Feature	Date.Statistics
gender	dccuploaddate
primarysiteofdesease	dateofbirth
histologicaltype	dateofdeath
tumorstage	dateoflastfollowup
tumorgrade	dateoftumorrecurrence
patienttumorrecurrencestatus	dateofinitialpathologicdiagnosis
radiationtherapy	datelastknownalive
neoadjuvanttherapy	vitalstatus
pathologicspread(pt)
pathologicspread(pn)
karnofskyperformancescore

Results

Tier 1 Data Statistics

Table 2. Statistics of selected clinical variables.

Clinical.Variable	Statistics
age	mean: 66, std: 11
vitalstatus	57 living, 23 deceased
gender	34 male, 46 female
histologicaltype	18 lung adenocarcinoma mixed subtype, 46 lung adenocarcinoma- not otherwise specified (nos), 1 lung mucinous adenocarcinoma, 2 mucinous (colloid) adenocarcinoma, 2 lung papillary adenocarcinoma, 3 lung acinar adenocarcinoma, 3 lung bronchioloalveolar carcinoma mucinous, 4 lung bronchioloalveolar carcinoma nonmucinous, 1 lung micropapillary adenocarcinoma
tumorstage	17 stage ia, 4 stage iv, 25 stage ib, 14 stage iiia, 7 stage iia, 8 stage iib, 2 stage iiib
pathologicspread(pt)	16 t1, 34 t2, 7 t3, 11 t2a, 4 t1b, 3 t4, 3 t2b, 1 t1a
pathologicspread(pn)	49 n0, 1 nx, 15 n2, 14 n1

Tier 1 Data

Table 3. Get Full Table Illustration of the tier 1 data for three patients

Clinical.Variable	Sample_1	Sample_2	Sample_3
yearstobirth	70	56	67
daystodeath	NA	244	NA
daystolastfollowup	NA	NA	NA
vitalstatus	0	1	0
dccuploaddate	31-8-2012	31-8-2012	31-8-2012

Methods & Data

Work Flow

1. Each xml file is converted to a tab-delimited text file by our R package.

2. All text files are aggregated into one big table by the Clinical_Aggregate_Tier1 pipeline. The 1st column of the table is the entry name of the xml file and the rest columns are the associated data for samples.

3. Data for the tier 1 clinical variables are extracted by the Clinical_Picker_Tier1 pipeline.

Diagram of Clinical Data Dicer

Figure 1. Diagram that displays the work flow of processing clinical data. Clinical variables of interest and their associated values are marked in red and blue, respectively.

Download Results

This is an experimental feature. The full results of the analysis summarized in this report can be downloaded from the TCGA Data Coordination Center.

Made with Nozzle