Click here to


Are you sure ?

Yes, do it No, cancel

Clustering for Non-Redundant Feature Selection in Radiomics for Breast Cancer Risk Assessment

K Mendel*, S Porter , H Li , L Lan , D Schacht , M Giger , university Chicago, Chicago, IL


(Tuesday, 7/31/2018) 7:30 AM - 9:30 AM

Room: Room 202

Purpose: Given the large number of radiomic features available in medical imaging, we propose a method to identify features that are non-redundant, robust, and relevant. We illustrate this method by classifying presence of breast cancer risk factors using radiomic features from digital mammography.

Methods: Full-field digital mammograms (360 total) were retrospectively collected from 185 patients, of which 102 had known high-risk factors of breast cancer and 83 did not. Each patient underwent exams on two vendors separated in time by about one year. 291 radiomic texture features were extracted from each ROI covering central parenchymal tissue. Redundant features were grouped using unsupervised clustering, and the most robust feature from each cluster was identified through pairwise comparisons of feature values on ROIs from the two vendors for each patient. Performances with three different clustering methods (hierarchical, k-means and fuzzy c-means clustering) were compared. Cluster repeatability was assessed by the Rand index. Stepwise feature selection identified the most relevant of these robust, non-redundant features for use in leave-one-out support vector machine classifier. Area under the ROC curve served as a figure of merit in the task of classifying presence of risk factors.

Results: Clustering was found to be consistent across patient subsets for all three clustering methods (RI>0.94). A range of cluster numbers (n=30-100) were explored. For n=100 clusters, hierarchical (AUC=0.725±0.026) and k-means clustering (AUC=0.731±0.026) were observed to yield significantly superior performance than fuzzy c-means (AUC=0.615±0.029) after Holm-Bonferroni corrections.

Conclusion: We propose a cluster-based method to handle the large number of radiomic features present in radiomics research. The proposed method identifies robust, non-redundant features which are relevant to the clinical question of interest. Comparison of clustering methods suggests that hierarchical clustering and k-means clustering demonstrated superior performance than fuzzy c-means in the task of characterizing presence of risk factors of breast cancer.

Funding Support, Disclosures, and Conflict of Interest: Supported, in part, by the NIBIB of the NIH under grant number T32-EB002103, the NCI under grant number NIH-QIN-U01-195564. M.L.G. is a stockholder in R2 Technology/Hologic and a cofounder and shareholder in Quantitative Insights. M.L.G. receives royalties from Hologic, GE Medical Systems, MEDIAN Technologies, Riverain Medical, Mitsubishi, and Toshiba.


Mammography, Risk, Breast


IM- Breast x-ray Imaging: CAD

Contact Email