Room: Stars at Night Ballroom 2-3
Purpose: In daily treatment planning process, a human planner operates the treatment planning system (TPS) to adjust optimization parameters, e.g. dose volume constraints’ locations and weights, to achieve the best plan for each patient. This process is usually time-consuming, and the plan quality depends on planer’s experience and available planning time. In this study, we proposed to model the behavior of human planners in treatment planning by a deep reinforcement learning (DRL) based virtual treatment planner (VTP), such that the VTP automatically uses the TPS in a human-like manner for treatment planning.
Methods: VTP is established using a deep neural network developed under the Q-learning framework. Similar to a human’s behavior, VTP observes an intermediate treatment plan and decides the action to improve it, such as changing constraints’ weights and locations. We train the VTP in an end-to-end DRL process with an experience replay technique. Epsilon-greedy algorithm is implemented to help VTP effectively explore the impacts of different actions and quickly learn the correct one to improve plan quality. We demonstrate the feasibility and effectiveness of the proposed framework using treatment planning of intensity modulated radiation therapy (IMRT) for prostate cancer as an example.
Results: We successfully trained VTP for prostate IMRT treatment planning on 10 training patient cases. VTP spontaneously learns how to operate an in-house optimization engine to generate high-quality treatment plans. After training, we applied the VTP to another 63 prostate IMRT testing cases. All of the VTP-generated plans reach the highest plan quality score.
Conclusion: To our knowledge, this is the first time that intelligent treatment planning behaviors are autonomously encoded in an artificial intelligence system. The trained VTP is capable of behaving in a human-like way to produce high-quality plans.
Not Applicable / None Entered.
Not Applicable / None Entered.