Room: Track 3
Purpose: release repetitive efforts from human planners (HPs) in radiotherapy treatment planning, a virtual treatment planner (VTP) was established for prostate cancer intensity-modulated radiotherapy (IMRT) by modeling intelligent behaviors of HPs via deep reinforcement learning (DRL). Although VTP can operate a treatment planning system (TPS) automatically for high-quality plans, its training process is very time consuming preventing its further application to more complicate sites and modalities. To address this problem, in this study, we proposed to incorporate the knowledge from HPs as guidance to augment DRL for better VTP training efficiency.
Methods: to a HPs’ behavior, VTP observes an intermediate plan and decides the way of operating the TPS to improve it. Using the epsilon-greedy algorithm, DRL randomly samples among all possible operations, while operations improving plan quality are sampled with a low probability. Hence, numerous training episodes are needed before VTP learns the policy to operate the TPS. We incorporated HPs’ knowledge to augment DRL (AugDRL) since HPs are experienced in operating the TPS. In addition to random sampling, rules summarized based on HPs’ experience are implemented to help VTP explore favorable operations more efficiently. To demonstrate the effectiveness of the proposed AugDRL, we use prostate cancer IMRT as a testbed.
Results: trained two VTPs using AugDRL and DRL, respectively, on 10 patient cases, and applied them to 63 independent testing cases. Both VTPs spontaneously learned how to operate an in-house TPS to generate high-quality treatment plans. VTP trained with eight episodes using AugDRL was able to outperform that trained using DRL with 100 epochs. The training time was reduced from a week to ~13 hours.
Conclusion: have demonstrated the feasibility of using HPs’ knowledge to improve the efficiency of DRL. It potentially permits the further application of VTP to more complicated cancer sites and treatment modalities.