Room: Stars at Night Ballroom 2-3
Purpose: We observed that our deep learning based segmentation (DLseg) model was unable to achieve the same accuracy and consistency in local dataset as in the challenge dataset. It is hypothesized that the subtle difference in the CT was the cause. The purpose of this study is to investigate whether adding local cases to training can improve performance, how many are appropriate and whatâ€™s the appropriate strategy in picking those training cases.Materials &
Methods: Our model was trained on AAPM thoracic auto-segmentation challenge dataset (36 cases from 3 institutions) and achieved top performance. 45 thoracic patient cases were randomly selected from our clinical practice and split into 30 training and 15 test cases. The 30 training cases were ranked based on the segmentation accuracy achieved by the original DLseg model (orig). New DLseg models were trained by including 10 best cases (addH10), 10 worst cases (addL10), 20 worst cases (addL20) and all 30 cases (add30). The performance of each model on the remaining 15 validation cases were evaluated using Dice scores, mean surface distance (MSD) and 95% Hausdorff distance (HD95).
Results: Adding only 10 extra cases from local institution dramatically improved the segmentation accuracy and consistency. The â€œlearn from mistakeâ€? approach (addL10) did not show advantage over addH10, indicating that learning the subtle difference in the CT is more important. More local training cases brought little improvement to heart segmentation but have observable effect to esophagus segmentation (median DICE from 0.77 to 0.80). The review and edit time required by physician dropped from 7.5Â±3.8 mins to 2.7Â±1.0 mins for each case.
Conclusion: Adding cases from local institution to the training dataset can improve the accuracy and robustness of the DLseg model on cases from the local institution.