Click here to


Are you sure ?

Yes, do it No, cancel

Using Dataset-Specific Feature Standardization to Improve Predictive Performance of Radiomic Models

A Chatterjee1*, M Vallieres1 , A Dohan2 , I Levesque1 , Y Ueno3 , S Saif2 , C Reinhold2 , J Seuntjens1 , (1) McGill University, Medical Physics Unit, Montreal (2) McGill University, Department of Radiology, Montreal, (3) Kobe University, Department of Radiology, Kobe


(Sunday, 7/14/2019) 4:00 PM - 5:00 PM

Room: Exhibit Hall | Forum 2

Purpose: We aimed to create a statistical methodology to improve outcome prediction on external datasets of uterine adenocarcinoma patients with endpoints of (a) lymphovascular space invasion (LVSI), and (b) FIGO stage, grouped as early (IA) and advanced.

Methods: The central idea involves (a) creating balanced training and testing sets by under-sampling the majority class, and (b) standardizing training and testing sets separately. Standardization makes a feature distribution have zero mean and unit standard deviation. Standardizing features separately for each dataset reduces feature variability between datasets. In this retrospective proof-of-principle study, the teaching set (used for training and validation) contained 94 samples (Hospital X) and the testing set comprised 63 samples (Hospital Y). Six different MRI image sets were available for each patient. Features were divided into non-texture (e.g., morphological, histogram-based) and texture (e.g., matrix-based), to see if texture features, which are harder to interpret, were of benefit. The two prediction approaches were: (i) using single features, and (ii) combining only three features, using basic machine learning tools to avoid over-training. Feature selection was based on statistical stability of features in the teaching set. Corrections for multiple hypothesis testing were applied whenever appropriate.

Results: When including texture features in addition to non-texture features, the best AUC for a single feature improved in the training set (FIGO:0.78→0.82, LVSI:0.75→0.78) and the testing set (FIGO:0.78→0.81, LVSI:0.75→0.79). When combining three texture features into a model, the best AUC was better than for a single feature in the training set (FIGO:0.82→0.88, LVSI:0.78→0.83). However, this improvement was not observed in the testing set (FIGO:0.81→0.79, LVSI:0.79→0.76).

Conclusion: Our methodology yielded model performances that are statistically significant and match (FIGO stage) or surpass (LVSI) the performance of an expert radiologist. We believe standardization cannot eliminate differences between datasets but is a pragmatic approach for a retrospective study.

Funding Support, Disclosures, and Conflict of Interest: This work was supported in part by CREATE Medical Physics Research Training Network grant of the Natural Sciences and Engineering Research Council (Grant number: 432290), by the Strategic Training in Transdisciplinary Radiation Science for the 21st Century (STARS21) program, and the Canadian Institutes of Health Research Foundation Grant FDN-143257.


Not Applicable / None Entered.


Not Applicable / None Entered.

Contact Email