Click here to


Are you sure ?

Yes, do it No, cancel

Predicting Treatment Plans with Reinforcement Learning

A Babier1*, A McNiven2, TCY Chan3, (1) ,University of Toronto, Toronto, ON, CA, (2) Princess Margaret Cancer Centre, Toronto, ON, CA, (3) University of Toronto, Toronto, ON, CA


(Tuesday, 7/14/2020) 3:30 PM - 5:30 PM [Eastern Time (GMT-4)]

Room: Track 3

Purpose: To create a new knowledge-based planning (KBP) paradigm that out-performs the conventional predict-optimize pipeline in terms of both speed and the quality of plans it produces.

Methods: We trained two KBP prediction models that use generative adversarial networks (GANs) on a dataset of 120 clinical oropharyngeal radiotherapy plans. The first, Dose-GAN, is a standard KBP model that predicts full dose distributions, and the second, Fluence-GAN, is a novel model that that leverages dose kernels to quickly infer the fluence map of a dose distribution. These models were combined to initialize our RL-KBP model, which was then used to predict fluence maps for 72 out-of-sample patients. As a baseline, a published KBP model was implemented with Dose-GAN and a standard plan optimization method. Altogether, we used the fluence maps to construct two sets of KBP plans (RL-KBP plans and baseline-KBP plans) that were benchmarked against the clinical plans using clinical planning criteria to quantify plan quality. We also benchmarked the median error and compute time of Fluence-GAN against the standard optimization method from baseline-KBP.

Results: The RL-KBP plans satisfied clinical criteria more frequently (70.9%) than both the baseline-KBP plans (65.4%) and clinical plans (66.2%). On average, the RL-KBP plans also improved upon the clinical plans by 1.1 Gy over the dose-volume criteria assessed; the baseline-KBP plans only improved upon the criteria by 0.7 Gy. Additionally, Fluence-GAN inferred plans in one second, which is three orders of magnitude faster than the standard optimization, and with a smaller median error (1.7 Gy) than the standard optimization method (2.0 Gy).

Conclusion: Our RL-KBP model can predict plans that are superior to both clinical plans and plans generated by a conventional KBP that used the same dose prediction model. Additionally, Fluence-GAN dominates the standard plan optimization approach in terms of solve time and accuracy.

Funding Support, Disclosures, and Conflict of Interest: This research was supported in part by the Natural Sciences and Engineering Research Council of Canada.


Not Applicable / None Entered.


Not Applicable / None Entered.

Contact Email