The clinical information for each TCGA tumor sample is stored in a xml file. Patient ID, tumor and treatment info are entries of the xml file. These xml files have been preprocessed for further analysis.
Clinical data for tier 1 clinical variables are generated.
Table 1. Tier1 clinical variables
| Tumor.Feature | Date.Statistics |
|---|---|
| gender | dccuploaddate |
| primarysiteofdesease | dateofbirth |
| histologicaltype | dateofdeath |
| tumorstage | dateoflastfollowup |
| tumorgrade | dateoftumorrecurrence |
| patienttumorrecurrencestatus | dateofinitialpathologicdiagnosis |
| radiationtherapy | datelastknownalive |
| neoadjuvanttherapy | vitalstatus |
| pathologicspread(pt) | |
| pathologicspread(pn) | |
| karnofskyperformancescore |
Table 2. Statistics of selected clinical variables.
| Clinical.Variable | Statistics |
|---|---|
| age | mean: 62, std: 13 |
| vitalstatus | 3306 living, 1583 deceased |
| gender | 1835 male, 3044 female |
| histologicaltype | 364 colon adenocarcinoma, 56 colon mucinous adenocarcinoma, 23 untreated primary (de novo) gbm, 2 treated primary gbm, 325 head & neck squamous cell carcinoma, 502 kidney clear cell renal carcinoma, 215 lung adenocarcinoma- not otherwise specified (nos), 75 lung adenocarcinoma mixed subtype, 13 lung papillary adenocarcinoma, 2 lung mucinous adenocarcinoma, 7 mucinous (colloid) adenocarcinoma, 2 lung clear cell adenocarcinoma, 10 lung acinar adenocarcinoma, 15 lung bronchioloalveolar carcinoma nonmucinous, 3 lung bronchioloalveolar carcinoma mucinous, 2 lung solid pattern predominant adenocarcinoma, 3 lung micropapillary adenocarcinoma, 317 lung squamous cell carcinoma- not otherwise specified (nos), 7 lung basaloid squamous cell carcinoma, 1 lung small cell squamous cell carcinoma, 1 lung papillary squamous cell caricnoma, 1 lung papillary squamous cell carcinoma, 570 serous cystadenocarcinoma, 149 rectal adenocarcinoma, 13 rectal mucinous adenocarcinoma, 80 serous endometrial adenocarcinoma, 357 endometrioid endometrial adenocarcinoma, 18 mixed serous and endometrioid |
| patienttumorrecurrencestatus | 4889 without recurrence |
| tumorgrade | 251 grade 3, 106 grade 2, 90 grade 1, 8 high grade |
| tumorstage | 210 stage iia, 102 stage iiib, 120 stage iib, 45 stage iiic, 160 stage iv, 369 stage i, 182 stage iva, 4 stage iic, 147 stage ii, 197 stage iii, 120 stage iiia, 7 stage ivb, 136 stage ia, 210 stage ib, 403 iiic, 3 ib, 24 iiib, 87 iv, 19 iic, 3 ia, 8 iiia, 10 ic, 4 iib, 3 iia |
| pathologicspread(pt) | 534 t3, 9 t4b, 74 t4, 529 t2, 129 t4a, 168 t1, 1 t0, 1 tis, 35 tx, 150 t1b, 52 t3b, 161 t1a, 83 t2a, 120 t3a, 2 t3c, 30 t2b |
| pathologicspread(pn) | 1093 n0, 319 n1, 171 n2, 17 n1b, 74 n2b, 11 n2a, 17 n1a, 3 n1c, 327 nx, 34 n2c, 11 n3 |
Table 3. Get Full Table Illustration of the tier 1 data for three patients
| Clinical.Variable | Sample_1 | Sample_2 | Sample_3 |
|---|---|---|---|
| yearstobirth | 73 | 57 | 65 |
| daystodeath | NA | 223 | 81 |
| daystolastfollowup | 389 | NA | NA |
| vitalstatus | 0 | 1 | 1 |
| dccuploaddate | 15-2-2013 | 15-2-2013 | 15-2-2013 |
1. Each xml file is converted to a tab-delimited text file by our R package.
2. All text files are aggregated into one big table by the Clinical_Aggregate_Tier1 pipeline. The 1st column of the table is the entry name of the xml file and the rest columns are the associated data for samples.
3. Data for the tier 1 clinical variables are extracted by the Clinical_Picker_Tier1 pipeline.
Figure 1. Diagram that displays the work flow of processing clinical data. Clinical variables of interest and their associated values are marked in red and blue, respectively.