Click here to


Are you sure ?

Yes, do it No, cancel

Reinforcement Learning for Fast and Intelligent Radiation Therapy Optimization

W Hrinivich*, J Lee, Johns Hopkins University, Baltimore, MD


(Tuesday, 7/14/2020) 3:30 PM - 5:30 PM [Eastern Time (GMT-4)]

Room: Track 3

Purpose: Volumetric modulated arc therapy (VMAT) optimization is a complex high-dimensional problem which remains a time consuming component of treatment planning. Deep learning approaches have been proposed to augment existing optimizers, but to our knowledge, have not been directly applied to VMAT machine parameter optimization (MPO). Reinforcement learning (RL) is an unsupervised machine learning approach in which an algorithmic agent learns how to maximize a reward value through trial and error without labelled training data. In this proof-of-principle study, we applied RL to VMAT MPO in prostate cancer and assessed resultant dose distributions in an independent cohort.

Methods: We implemented a deep-Q RL technique for VMAT MPO in a two-dimensional beam model. The approach incorporates a six-layer convolutional neural network which predicts the long-term quality (Q) value for a set of discrete machine parameter updates. Q-values were computed using the inverse of a conventional IMRT objective function. Training was performed using CT scans and contours from 10 prostate cancer patients, and validated using an independent cohort of 10 patients. We compared the RL-based plans to conformal arcs and clinical intensity modulated radiotherapy (IMRT).

Results: Following 3 days of training, optimizing new plans in the validation cohort using the model took 20.8±4.3 s per patient. Compared to the conformal arc and clinical IMRT plans, the RL-based VMAT plans provided mean±SD (p-value) differences in prostate V100% of -1.22+0.69% (<.001) and -0.75+2.79% (.207), bladder V75% of -5.06+2.70% (<.001) and -7.26+14.09% (.069), and rectum V75% of -18.00+4.96% (<.001) and 7.68+8.19% (.008), respectively.

Conclusions: We have implemented a RL technique for VMAT MPO which maintained target coverage and decreased normal tissue dose compared to conformal arc plans without requiring previous treatment plans for training. Further improvements in network design and computing hardware may enable significant decreases in time required for clinical VMAT planning.


Treatment Planning, Optimization, Prostate Therapy


TH- External Beam- Photons: IMRT/VMAT dose optimization algorithms

Contact Email