The clinical information for each TCGA tumor sample is stored in a xml file. Patient ID, tumor and treatment info are entries of the xml file. These xml files have been preprocessed for further analysis.
Clinical data for tier 1 clinical variables are generated.
Table 1. Tier1 clinical variables
| Tumor.Feature | Date.Statistics |
|---|---|
| gender | dccuploaddate |
| primarysiteofdesease | dateofbirth |
| histologicaltype | dateofdeath |
| tumorstage | dateoflastfollowup |
| tumorgrade | dateoftumorrecurrence |
| patienttumorrecurrencestatus | dateofinitialpathologicdiagnosis |
| radiationtherapy | datelastknownalive |
| neoadjuvanttherapy | vitalstatus |
| pathologicspread(pt) | |
| pathologicspread(pn) | |
| karnofskyperformancescore |
Table 2. Statistics of selected clinical variables.
| Clinical.Variable | Statistics |
|---|---|
| age | mean: NaN, std: NA |
| vitalstatus | 3307 living, 1491 deceased |
| gender | 1788 male, 3000 female |
| histologicaltype | 363 colon adenocarcinoma, 56 colon mucinous adenocarcinoma, 541 untreated primary (de novo) gbm, 20 treated primary gbm, 227 head & neck squamous cell carcinoma, 502 kidney clear cell renal carcinoma, 84 kidney papillary renal cell carcinoma, 42 astrocytoma, 48 oligodendroglioma, 27 oligoastrocytoma, 177 lung adenocarcinoma- not otherwise specified (nos), 61 lung adenocarcinoma mixed subtype, 10 lung papillary adenocarcinoma, 2 lung mucinous adenocarcinoma, 3 mucinous (colloid) adenocarcinoma, 2 lung clear cell adenocarcinoma, 3 lung acinar adenocarcinoma, 3 lung bronchioloalveolar carcinoma mucinous, 8 lung bronchioloalveolar carcinoma nonmucinous, 2 lung micropapillary adenocarcinoma, 1 lung solid pattern predominant adenocarcinoma, 261 lung squamous cell carcinoma- not otherwise specified (nos), 6 lung basaloid squamous cell carcinoma, 1 lung papillary squamous cell caricnoma, 1 lung papillary squamous cell carcinoma, 570 serous cystadenocarcinoma, 149 rectal adenocarcinoma, 13 rectal mucinous adenocarcinoma, 13 stomach adenocarcinoma - diffuse type, 98 stomach adenocarcinoma - not otherwise specified (nos), 7 stomach intestinal adenocarcinoma - mucinous type, 10 stomach intestinal adenocarcinoma - tubular type, 3 stomach intestinal adenocarcinoma - papillary type, 26 stomach intestinal adenocarcinoma - type not otherwise specified (nos), 60 uterine serous endometrial adenocarcinoma, 121 endometrioid endometrial adenocarcinoma (grade 3), 104 endometrioid endometrial adenocarcinoma (grade 1 or 2), 15 mixed serous and endometrioid, 57 endometrioid endometrial adenocarcinoma (grade 2), 34 endometrioid endometrial adenocarcinoma (grade 1) |
| tumorgrade | 830 g3, 487 g2, 39 g1, 28 gx, 72 g4, 1 gb, 165 grade 3, 107 grade 2, 89 grade 1, 26 high grade |
| tumorstage | 107 stage iiib, 196 stage iia, 88 stage iib, 46 stage iiic, 186 stage iv, 402 stage i, 130 stage iva, 4 stage iic, 185 stage ii, 215 stage iii, 134 stage iiia, 5 stage ivb, 105 stage ia, 3 stage ivc, 210 stage ib, 403 iiic, 3 ib, 24 iiib, 87 iv, 19 iic, 3 ia, 8 iiia, 10 ic, 4 iib, 3 iia |
| pathologicspread(pt) | 593 t3, 12 t4b, 81 t4, 523 t2, 109 t4a, 157 t1, 1 t0, 1 tis, 32 tx, 140 t1b, 61 t3b, 165 t1a, 68 t2a, 126 t3a, 2 t3c, 32 t2b |
| pathologicspread(pn) | 351 n1, 188 n2, 1055 n0, 17 n1b, 60 n2b, 13 n2a, 17 n1a, 3 n1c, 346 nx, 28 n2c, 22 n3, 1 n3a |
Table 3. Get Full Table Illustration of the tier 1 data for three patients
| Clinical.Variable | Sample_ID | Sample_ID | Sample_ID |
|---|---|---|---|
| daystodeath | sample_1 | sample_2 | sample_3 |
| daystolastfollowup | 389 | 223 | 81 |
| karnofskyperformancescore | NA | NA | NA |
| histologicaltype | NA | NA | NA |
| vitalstatus | 0 | 1 | 1 |
1. Each xml file is converted to a tab-delimited text file by our R package.
2. All text files are aggregated into one big table by the Clinical_Aggregate_Tier1 pipeline. The 1st column of the table is the entry name of the xml file and the rest columns are the associated data for samples.
3. Data for the tier 1 clinical variables are extracted by the Clinical_Picker_Tier1 pipeline.
Figure 1. Diagram that displays the work flow of processing clinical data. Clinical variables of interest and their associated values are marked in red and blue, respectively.