Room: Stars at Night Ballroom 2-3
Purpose: Using reinforcement learning to systematically address complex tradeoffs and physician preferences for pancreas stereotactic body radiation therapy (SBRT) treatment planning.
Methods: The focus of pancreas SBRT planning is finding a balance between gastrointestinal organs-at-risk (OARs) sparing and planning target volume (PTV) coverage. Planners evaluate dose distributions, making adjustments in order to optimize PTV coverage while adhering to OAR dose constraints. We have formulated such interactions between the planner and the treatment planning system (TPS) into a finite-horizon reinforcement-learning (RL) model. First, the planning status states are discretized in a similar fashion to how planners evaluate plans (e.g., constraint satisfaction, target coverage). Second, steps that planners would commonly implement to address different planning needs are defined as actions. Third, we have derived a â€œrewardâ€? system based on physician input. Finally, machine learning is implemented as a state-action-reward-state-action (SARSA) reinforcement learning process with limited dimensionality to ensure convergence and performance. The RL system was trained with 14 plans, cycling with 20 epochs, each of which consisted of 15 sequential agent-TPS interactions. The RL agent then planned 16 cases in a separate validation set.
Results: The RL agent took 18 minutes to plan each validation case, in contrast to 1-2 hours of manual planning. All 16 clinical plans and 16 RL plans meet pre-defined GI constraints (V30Gy<1cc). The differences between primary PTV coverage of RL plans (98.9%Â±0.5%) compared to clinical plans (97.6%Â±0.7%) are significant (paired-sample t-test, p=0.006), while simultaneous integrated boost PTV coverages are similar for clinical plans (87.6%Â±4.9%) and RL plans (86.7%Â±5.2%) (p=0.63).
Conclusion: The reinforcement learning process is capable of capturing planner experience and prior knowledge of pancreas SBRT planning. This study demonstrated that the performance of the RL agent is comparable to that of human planners with much more efficient planning time.
Funding Support, Disclosures, and Conflict of Interest: This work is supported in part by a grant from NIH/NCI under grant number R01CA201212 and a master research grant from Varian Medical Systems.