The clinical information for each TCGA tumor sample is stored in a xml file. Patient ID, tumor and treatment info are entries of the xml file. These xml files have been preprocessed for further analysis.
Clinical data for tier 1 clinical variables are generated.
Table 1. Tier1 clinical variables
Tumor.Feature | Date.Statistics |
---|---|
gender | dccuploaddate |
primarysiteofdesease | dateofbirth |
histologicaltype | dateofdeath |
tumorstage | dateoflastfollowup |
tumorgrade | dateoftumorrecurrence |
patienttumorrecurrencestatus | dateofinitialpathologicdiagnosis |
radiationtherapy | datelastknownalive |
neoadjuvanttherapy | vitalstatus |
pathologicspread(pt) | |
pathologicspread(pn) | |
karnofskyperformancescore |
Table 2. Statistics of selected clinical variables.
Clinical.Variable | Statistics |
---|---|
age | mean: 65, std: 10 |
vitalstatus | 637 living, 184 deceased |
gender | 367 male, 454 female |
histologicaltype | 527 lung adenocarcinoma- not otherwise specified (nos), 161 lung adenocarcinoma mixed subtype, 32 lung papillary adenocarcinoma, 6 lung mucinous adenocarcinoma, 15 mucinous (colloid) adenocarcinoma, 4 lung clear cell adenocarcinoma, 22 lung acinar adenocarcinoma, 34 lung bronchioloalveolar carcinoma nonmucinous, 8 lung bronchioloalveolar carcinoma mucinous, 6 lung solid pattern predominant adenocarcinoma, 6 lung micropapillary adenocarcinoma |
tumorstage | 43 stage iv, 115 stage iiia, 239 stage ib, 202 stage ia, 22 stage iiib, 114 stage iib, 66 stage iia, 6 stage i |
pathologicspread(pt) | 302 t2, 68 t3, 121 t1, 34 t4, 37 t2b, 68 t1b, 112 t2a, 72 t1a, 4 tx |
pathologicspread(pn) | 118 n2, 520 n0, 155 n1, 4 n3, 19 nx |
Table 3. Get Full Table Illustration of the tier 1 data for three patients
Clinical.Variable | Sample_1 | Sample_2 | Sample_3 |
---|---|---|---|
yearstobirth | 70 | 70 | 81 |
daystodeath | NA | NA | NA |
daystolastfollowup | 0 | 0 | 395 |
vitalstatus | 0 | 0 | 0 |
dccuploaddate | 31-8-2012 | 31-8-2012 | 31-8-2012 |
1. Each xml file is converted to a tab-delimited text file by our R package.
2. All text files are aggregated into one big table by the Clinical_Aggregate_Tier1 pipeline. The 1st column of the table is the entry name of the xml file and the rest columns are the associated data for samples.
3. Data for the tier 1 clinical variables are extracted by the Clinical_Picker_Tier1 pipeline.
Figure 1. Diagram that displays the work flow of processing clinical data. Clinical variables of interest and their associated values are marked in red and blue, respectively.
