Room: AAPM ePoster Library
Purpose:
To enhance the time-efficiency of contouring of organs at risk (OARs), machine learning techniques are being adopted in radiotherapy. Its main advantage is the ability to learn the most suitable representation of data for given tasks. We present our experience with training and comparative validation of a commercially available deep learning product.
Methods:
A cohort of 213 H&N patients were used for training the model, with 12 individual OARs, all contoured and clinically approved by a single physician. A separate cohort of 85 cases was used for validation. Dice Similarity Coefficient (DSC) and Jaccard Similarity Coefficient (JSC) were used to evaluate the contours generated from the model against the expert’s contours. Additionally, a random sample of 20 CTs was selected and qualitatively evaluated by the physician. A published model trained for the same software at a different institution with 589 H&N patients was used for performance comparison.
Results:
For the 12 OARs, mean DSC ranged from 0.48 to 0.89 and JSC ranged from 0.32 to 0.8. None of the structures were evaluated to be ideal without some edits, but 98% of contours were still considered clinically useful with minor modifications. Overall, DSC values agreed well with the score given by the physician. When comparing two deep-learning models, the model trained with less data (213 vs 589 cases) returned higher-quality contours. This suggests that the quality of the data is more important than quantity when building a high-performance model.
Conclusion:
A DSC value =0.7 indicates low inter-observer variability. In our study, all but one of the contours were above this threshold. The qualitative assessment confirmed the reliability of DSC by demonstrating the compatibility between the expert’s evaluation and DSC values. Deep learning auto contouring could be a useful practical tool to speed up the process of contouring for treatment planning.