Purpose: Computer-aided diagnosis can use machine learning methods to classify lesions using radiomic features extracted from medical images. Applying these methods across separate populations, which may differ in both patient biology and image acquisition protocols, requires investigation into the potential need for harmonization. The aim of this study was to use dimension reduction techniques to understand differences in two populations of breast lesions and to investigate the use of post-image-reconstruction data harmonization methods.
Methods: Dynamic contrast-enhanced magnetic resonance (DCE-MR) images of breast lesions acquired in the United States and China were collected in a retrospective IRB-approved/HIPAA-compliant study, 267 and 481 benign lesions and 592 and 1069 cancers in the United States and China, respectively. Lesions were segmented using a fuzzy c-means method and 38 radiomic features were automatically extracted. Subsequently, dimension reduction, from 38 features to two, was achieved using t-distributed stochastic neighbor embedding (t-SNE) methods. To investigate the effect of feature harmonization, ComBat data harmonization prior to t-SNE was used to standardize the radiomic features, with lesion type (benign or cancer) and status of lesions as having been imaged pre- or post-biopsy as covariates to retain the biological nature of the lesions. The effect of harmonization was assessed separately for benign lesions and for cancers across the two populations using k-means clustering of the t-SNE values for the two groups, using the Davies-Bouldin metric as a measure of inter- and intra-cluster agreement.
Results: Data harmonization through ComBat methods resulted in a decrease in population-based separation of radiomic features after dimension reduction through tSNE, as indicated by an increase in the Davies-Bouldin metric for each type of lesion.
Conclusion: Post-image-reconstruction data harmonization methods may be useful in developing computer-aided diagnosis methods for combining databases of radiomic features of lesions imaged in different populations.
Funding Support, Disclosures, and Conflict of Interest: NCI U01 CA195564 NCI R15 CA227948 MLG: stockholder in R2 Technology/Hologic, cofounder/equity holder in Quantitative Insights, receives royalties from Hologic, GE Medical Systems, MEDIAN Technologies, Riverain Medical, Mitsubishi, and Toshiba. AE: Research Consultant, QView Medical, Inc. and Quantitative Insights, Inc. JP: Research Consultant, QView Medical, Inc.