Room: Exhibit Hall | Forum 2
Purpose: To propose a new automated technique for mapping structure names from DICOM structure sets to a defined naming schema that will enable data abstraction for big data research.
Methods: DICOM structure sets from approximately 1500 prostate and lung patients from 40 centers were used to determine the variability of names and identify the challenge of mapping them to the TG-263 naming schema. Regular expression (RE) method was compared with a volumetric machine learning model to automate this mapping process. Volumetric bitmaps were created based on the contoured volumes and converted to features vectors and dimensionality reduction was performed using Principle Component Analysis. The resulting vectors were used for training a Support Vector Machine (SVM) model using Apache Spark with 5-fold cross validation to automatically map the structure names.
Results: From this dataset over 7500 different structure names were identified showing the existence of large variability and multiple expressions for the same anatomical volume. Using RE method, we found matches for structures where there was high degree of consistency in naming. Examples of these structures are rectum and heart. However, this method did not perform well for structures with high variability in names. The SVM model is based on the shape, size and volume of each structure and hence outperforms the RE method for all types of structures [Accuracy: Femur_R (SVM: 0.9354, RE: 0.5101) and Prostate PTV (SVM: 0.8865, RE: 0.0810)].
Conclusion: This work shows the inconsistency in structure set naming makes it difficult for simple RE to perform accurate mappings. Machine learning models such as SVMs use shape, size and volume-based properties which have the potential to outperform RE methods. Accuracy of this model may be further improved by adding information from regular expressions, imaging segmentation techniques, and the relative position of the organ in the body.