Room: Exhibit Hall | Forum 1
Purpose: To evaluate the performance achieved by deep learning based mammography reading algorithm on public datasets (DDSM, INbreast) and datasets from our institution.
Methods: An end-to-end CNN architecture was used, with the first part of the architecture trained to classify image patches and the second part of the architecture trained to classify the entire mammography. DDSM was used for training the network. The trained network was transferred to INbreast and our dataset to evaluate the performance.
Results: For the patch classifier, weâ€™ve achieved the best validation accuracy (AUC) of 0.88 on the patches created from DDSM ROI annotation. For the classification of the entire image in the DDSM dataset, the best validation AUC achieved is 0.82. The model is transferred to INbreast dataset and AUC of 0.93 is achieved. However, when we apply the same transfer learning to our clinical dataset, the best AUC achieved is only 0.70. A through comparison between these datasets showed that there are significant difference in image contrast, subtlety score, presence of benign abnormal findings, and cancer staging. Our dataset is from diagnostic mammography instead of screening mammography which is part of the reason why our data is more challenging.
Conclusion: There can be a great discrepancy between the performance metric achieved in the public mammography database and in the local clinical practice. We believe that it is necessary to build an updated high-quality mammography database that contains the variation in pathological heterogeneity of breast cancer and benign lesions to reflect the current clinic practice.
Funding Support, Disclosures, and Conflict of Interest: We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.