Room: Track 3
The purpose of the study is to interpret the planning knowledge obtained by our reinforcement learning-powered planning agent and to validate the model robustness.
The focus of pancreas SBRT planning is finding a balance between gastrointestinal OAR sparing and PTV coverage. We have recently developed a finite-horizon reinforcement-learning (RL) model that automatically interacts with the treatment planning system to generate plans that satisfy physicians’ constraints while optimizing OAR sparing. First, the planning status information is inferred as a set of features. Second, steps that planners would commonly implement to address different planning needs are defined as actions. Third, we have derived a reward system based on physician input and developed an RL system via scripting. The RL agent was trained over 10 epochs with 48 pancreas patients, each of which consisted of 20 sequential agent-TPS interactions. The RL agent then planned 24 cases in a separate validation set. Additionally, to validate the reproducibility of the training phase, we re-trained the agent with the same set of parameters. The planning strategies learned by the model were analyzed by experienced planners.
The planning agent generated clinically acceptable plans for all 24 validation patients. The average feature patterns corresponding to various planning decisions are very different, demonstrating that the agent takes consistent and predictable actions. More importantly, the knowledge gained by the RL agent is meaningful and in line with human planning knowledge. Additionally, the knowledge maps learned in separate training sessions are consistent (2.5% mean absolute difference).
We have demonstrated that the training phase of our planning agent is tractable and reproducible, and the knowledge obtained by the agent is interpretable. As a result, the trained planning agent can be validated by human planners and serve as a robust auto-planning routine in the clinics.
Funding Support, Disclosures, and Conflict of Interest: This work was partially supported by NIH R01CA201212 and a Varian Master Grant.