Click here to


Are you sure ?

Yes, do it No, cancel

Automating Beam Orientation Optimization for IMRT Treatment Planning: A Deep Reinforcement Learning Approach

O Ogunmolu1,2*, D Nguyen1, C Shen1, X Jia1, W Lu1, N Gans2, S Jiang1, (1) Medical Artificial Intelligence and Automation Laboratory, UT Southwestern Medical Center, Dallas, TX, (2) Sensing, Robotics, Vision, Control and Estimation Laboratory, University of Texas at Dallas, Richardson, Texas


(Sunday, 7/29/2018) 1:00 PM - 1:55 PM

Room: Room 209

Purpose: This work focuses on automating beam orientation optimization (BOO) during IMRT treatment planning. Being a nonconvex problem, we develop a hybrid approach – combining techniques from deep reinforcement learning (DRL), and fluence map optimization (FMO), to find beam orientations that yield satisfactory beam orientation and dose distribution, in terms of IMRT treatment plan quality.

Methods: We delineate each patient’s contour to separate structures, then parameterize them by a deep convolutional Q-neural network (DQN), which outputs a set of Q-values. The predicted Q-values inform of which beam angle to tune, and to what degree a selected beam angle should to be adjusted – this enables adequate exploration of possible beam angles for each patient under consideration. We score dose delivered at explored angles using the associated FMO (inverted) cost as the reward. Each iteration’s Markov Decision Process variables are stored in an experience replay for a sufficient number of episodes. Once the exploration phase is done, we exploit past experiences by optimizing the experience replay buffer using dynamic programming – this is essentially a maximization of the expectation of the Hamilton-Jacobi-Bellman cost-to-go function. We thus arrive at a set of candidate beam angles that yield good treatment quality.

Results: Examination of dose wash plots for the patients that we considered shows good dose delivery at the resulting beam angles. The DVH plots also show that the target volume received clinically acceptable dose coverage while other organs stayed well below their tolerances.

Conclusion: Inspired by learning through exploration in high-dimensional state spaces, we applied a variant of DQN to BOO in discrete action spaces. By formulating the DQN reward as an FMO function, we obtained good beam angles for effective dose delivery. In the future, we will consider continuous action spaces and run the algorithm on real-life cases.

Funding Support, Disclosures, and Conflict of Interest: This project is supported by the Cancer Prevention and Research Institute of Texas (CPRIT) (IIRA RP150485).


Intensity Modulation, Optimization, Treatment Planning


TH- External beam- photons: IMRT dose optimization algorithms

Contact Email