The clinical information for each TCGA tumor sample is stored in a xml file. Patient ID, tumor and treatment info are entries of the xml file. These xml files have been preprocessed for further analysis.
Clinical data for tier 1 clinical variables are generated.
Table 1. Tier1 clinical variables
Tumor.Feature | Date.Statistics |
---|---|
gender | dccuploaddate |
primarysiteofdesease | dateofbirth |
histologicaltype | dateofdeath |
tumorstage | dateoflastfollowup |
tumorgrade | dateoftumorrecurrence |
patienttumorrecurrencestatus | dateofinitialpathologicdiagnosis |
radiationtherapy | datelastknownalive |
neoadjuvanttherapy | vitalstatus |
pathologicspread(pt) | |
pathologicspread(pn) | |
karnofskyperformancescore |
Table 2. Statistics of selected clinical variables.
Clinical.Variable | Statistics |
---|---|
age | mean: 61, std: 13 |
vitalstatus | 2639 living, 1195 deceased |
gender | 2619 female, 1205 male |
histologicaltype | 364 colon adenocarcinoma, 56 colon mucinous adenocarcinoma, 23 untreated primary (de novo) gbm, 2 treated primary gbm, 502 kidney clear cell renal carcinoma, 269 lung squamous cell carcinoma- not otherwise specified (nos), 7 lung basaloid squamous cell carcinoma, 1 lung small cell squamous cell carcinoma, 1 lung papillary squamous cell caricnoma, 1 lung papillary squamous cell carcinoma, 570 serous cystadenocarcinoma, 149 rectal adenocarcinoma, 13 rectal mucinous adenocarcinoma, 79 serous endometrial adenocarcinoma, 333 endometrioid endometrial adenocarcinoma, 18 mixed serous and endometrioid, 7 endometrioid endometrial adenocarcinoma (grade 3), 3 endometrioid endometrial adenocarcinoma (grade 1 or 2), 9 endometrioid endometrial adenocarcinoma (grade 1), 2 endometrioid endometrial adenocarcinoma (grade 2) |
patienttumorrecurrencestatus | 3834 without recurrence |
tumorgrade | 248 grade 3, 106 grade 2, 90 grade 1, 7 high grade |
tumorstage | 179 stage iia, 90 stage iiib, 60 stage iib, 45 stage iiic, 140 stage iv, 346 stage i, 26 stage iva, 4 stage iic, 97 stage ii, 152 stage iii, 59 stage iiia, 1 stage ivb, 44 stage ia, 100 stage ib, 403 iiic, 3 ib, 24 iiib, 87 iv, 19 iic, 3 ia, 8 iiia, 10 ic, 4 iib, 3 iia |
pathologicspread(pt) | 432 t3, 8 t4b, 56 t4, 292 t2, 24 t4a, 80 t1, 1 t0, 1 tis, 121 t1b, 52 t3b, 131 t1a, 36 t2a, 120 t3a, 2 t3c, 13 t2b |
pathologicspread(pn) | 750 n0, 196 n1, 107 n2, 17 n1b, 15 n2b, 7 n2a, 17 n1a, 3 n1c, 253 nx, 5 n3 |
Table 3. Get Full Table Illustration of the tier 1 data for three patients
Clinical.Variable | Sample_1 | Sample_2 | Sample_3 |
---|---|---|---|
yearstobirth | 70 | 59 | 56 |
daystodeath | NA | NA | NA |
daystolastfollowup | 259 | 437 | 1320 |
vitalstatus | 0 | 0 | 0 |
dccuploaddate | 17-8-2012 | 17-8-2012 | 17-8-2012 |
1. Each xml file is converted to a tab-delimited text file by our R package.
2. All text files are aggregated into one big table by the Clinical_Aggregate_Tier1 pipeline. The 1st column of the table is the entry name of the xml file and the rest columns are the associated data for samples.
3. Data for the tier 1 clinical variables are extracted by the Clinical_Picker_Tier1 pipeline.
Figure 1. Diagram that displays the work flow of processing clinical data. Clinical variables of interest and their associated values are marked in red and blue, respectively.
