The clinical information for each TCGA tumor sample is stored in a xml file. Patient ID, tumor and treatment info are entries of the xml file. These xml files have been preprocessed for further analysis.
Clinical data for tier 1 clinical variables are generated.
Table 1. Tier1 clinical variables
Tumor.Feature | Date.Statistics |
---|---|
gender | dccuploaddate |
primarysiteofdesease | dateofbirth |
histologicaltype | dateofdeath |
tumorstage | dateoflastfollowup |
tumorgrade | dateoftumorrecurrence |
patienttumorrecurrencestatus | dateofinitialpathologicdiagnosis |
radiationtherapy | datelastknownalive |
neoadjuvanttherapy | vitalstatus |
pathologicspread(pt) | |
pathologicspread(pn) | |
karnofskyperformancescore |
Table 2. Statistics of selected clinical variables.
Clinical.Variable | Statistics |
---|---|
age | mean: 62, std: 13 |
vitalstatus | 6937 living, 3199 deceased |
gender | 3822 male, 6304 female |
histologicaltype | 774 colon adenocarcinoma, 119 colon mucinous adenocarcinoma, 46 untreated primary (de novo) gbm, 4 treated primary gbm, 712 head & neck squamous cell carcinoma, 1018 kidney clear cell renal carcinoma, 527 lung adenocarcinoma- not otherwise specified (nos), 161 lung adenocarcinoma mixed subtype, 32 lung papillary adenocarcinoma, 6 lung mucinous adenocarcinoma, 15 mucinous (colloid) adenocarcinoma, 4 lung clear cell adenocarcinoma, 22 lung acinar adenocarcinoma, 34 lung bronchioloalveolar carcinoma nonmucinous, 8 lung bronchioloalveolar carcinoma mucinous, 6 lung solid pattern predominant adenocarcinoma, 6 lung micropapillary adenocarcinoma, 672 lung squamous cell carcinoma- not otherwise specified (nos), 16 lung basaloid squamous cell carcinoma, 2 lung small cell squamous cell carcinoma, 2 lung papillary squamous cell caricnoma, 2 lung papillary squamous cell carcinoma, 1148 serous cystadenocarcinoma, 299 rectal adenocarcinoma, 27 rectal mucinous adenocarcinoma, 161 serous endometrial adenocarcinoma, 724 endometrioid endometrial adenocarcinoma, 36 mixed serous and endometrioid |
patienttumorrecurrencestatus | 10136 without recurrence |
tumorgrade | 511 grade 3, 214 grade 2, 180 grade 1, 16 high grade |
tumorstage | 457 stage iia, 212 stage iiib, 268 stage iib, 95 stage iiic, 327 stage iv, 753 stage i, 393 stage iva, 9 stage iic, 323 stage ii, 410 stage iii, 249 stage iiia, 14 stage ivb, 321 stage ia, 473 stage ib, 813 iiic, 6 ib, 48 iiib, 173 iv, 40 iic, 6 ia, 16 iiia, 20 ic, 8 iib, 6 iia |
pathologicspread(pt) | 1144 t3, 20 t4b, 150 t4, 1112 t2, 281 t4a, 352 t1, 2 t0, 2 tis, 70 tx, 334 t1b, 104 t3b, 349 t1a, 209 t2a, 242 t3a, 4 t3c, 76 t2b |
pathologicspread(pn) | 2372 n0, 684 n1, 360 n2, 35 n1b, 160 n2b, 22 n2a, 36 n1a, 6 n1c, 677 nx, 73 n2c, 24 n3 |
Table 3. Get Full Table Illustration of the tier 1 data for three patients
Clinical.Variable | Sample_1 | Sample_2 | Sample_3 |
---|---|---|---|
yearstobirth | 73 | 73 | 57 |
daystodeath | NA | NA | 223 |
daystolastfollowup | 389 | 389 | NA |
vitalstatus | 0 | 0 | 1 |
dccuploaddate | 15-3-2013 | 15-3-2013 | 15-3-2013 |
1. Each xml file is converted to a tab-delimited text file by our R package.
2. All text files are aggregated into one big table by the Clinical_Aggregate_Tier1 pipeline. The 1st column of the table is the entry name of the xml file and the rest columns are the associated data for samples.
3. Data for the tier 1 clinical variables are extracted by the Clinical_Picker_Tier1 pipeline.
Figure 1. Diagram that displays the work flow of processing clinical data. Clinical variables of interest and their associated values are marked in red and blue, respectively.
