LUSC/00: Clinical Data Preprocess

The clinical information for each TCGA tumor sample is stored in a xml file. Patient ID, tumor and treatment info are entries of the xml file. These xml files should be preprocessed for further analysis.


Clinical data for tier 1 clinical variables are generated.

The tier 1 clinical variables include: dateofbirth, dateofdeath, dateoflastfollowup, dateoftumorrecurrence, patienttumorrecurrencestatus, karnofskyperformancescore, histologicaltype, radiationtherapy, vitalstatus, dccuploaddate, neoadjuvanttherapy, primarysiteofdesease, tumorgrade, gender, dateofinitialpathologicdiagnosis, pathologicspread(pt), pathologicspread(pn), tumorstage, datelastknownalive


Tier 1 clinical data.

Methods & Data
Work Flow

1. Each xml file is converted to a tab-delimited text file by our R package.

2. All text files are aggregated into one big table by the Clinical_Aggregate_Tier1 pipeline. The 1st column of the table is the entry name of the xml file and the rest columns are the associated data for samples.

3. Data for the tier 1 clinical variables are extracted by the Clinical_Picker_Tier1 pipeline.

Diagram of Clinical Data Dicer

Figure 1.  Diagram that displays the work flow of processing clincal data. Clinical variables of interest and their associated values are marked in red and blue, respectively.