Room: Davidson Ballroom A
Purpose: Inverse treatment planning (ITP) is often formulated as optimization problems that contain multiple terms designed for clinical considerations. As the plan quality is affected by relative weights among different terms, human planners are involved in this weight-tuning process to achieve a satisfactory quality. Not only does this require intensive efforts, the plan quality is affected by factors such as human experience. In this study, motivated by recent advances in deep learning, we develop a deep reinforcement learning (DRL) based method to automatically adjust the weights in a human-like manner. We demonstrate our idea in an example problem of ITP for high-dose-rate brachytherapy (HDRBT) in cervical cancer.
Methods: We consider the optimization problem with an objective function of weighted sum of organ doses. A virtual planner network (VPN) is established, which observes dose-volume histograms and outputs the direction and amplitude of adjustment of organ weights. For training purpose, we define a reward function as weighted sum of D2cc values of critical organs, since D2cc is a clinical quantity typical used for cervical cancer HDRBT plan evaluation. VPN is trained using DRL with an epsilon-greedy algorithm using five patient cases to learn to output appropriate actions maximizing reward. We then test VPN using another five cases. In each case, with randomly initialized weights, VPN is repeatedly applied until the plan quality cannot be further improved.
Results: Through VPN-guided weight-tuning, the sum of D2cc values are reduced by 6.1% compared to randomly initialized weights, while maintaining the same CTV coverage, and by 4.6% compared to the plans developed by human physicists.
Conclusion: We have demonstrated the effectiveness of the DRL-based ITP in HDRBT. It is potentially applicable to external beam therapy due to similar structure of the optimization problems between HDRBT and external beam therapy.
Not Applicable / None Entered.
Not Applicable / None Entered.