Room: Track 2
Purpose: To develop an autonomous computer program based on quantum reinforcement learning (QRL) that can learn complex hidden patterns from patients’ biophysical features and provide support to adapt dose fractionation for individuals based on estimated tumor and normal tissue responses.
Methods: A total of 68 non-small cell lung cancer (NSCLC) patients who received radiotherapy (RT) are considered in creating a model-based QRL agent. We employed Deep Neural Networks (DNNs) as Transition and Reward functions to approximate the RT environment as part of a Markov decision process. RT state features were selected from a multi-objective Bayesian network: IP10, GLSZM-ZSV, cxcr1-Rs2234671, Tumor-gEUD, and Lung-gEUD. The Reward function, a function of local control (LC) and grade 2 radiation-induced pneumonitis (RP2) probabilities, were designed such that the agent received a positive reward for making dose fraction selections that resulted in P(LC)>70% and P(RP2)<17.2%, and negative reward otherwise. A quantum 2 qubits (4 levels) superimposed states were considered, via Qiskit library, for the actions corresponding to 2, 2.5, 3, and 3.5 Gy/frac. The QRL agent chose optimal actions through Grover amplitude amplification and quantum measurement following the Bellman’s optimality update rule.
Results: The Transition Function yielded RMSE of 3.07e-3, 3.74e-2, 3.32e-3, and 1.35e-2 for normalized state features IP10, GLSZM-ZSV, Tumor-gEUD, and Lung-gEUD respectively. The LC and RP2 DNN classifier yielded accuracies of 82% and 76%, respectively. We ran 10,000 training episodes for 3 separate QRL agents, that suggested AVG±STDEV dose fractions of 2.81±0.63, 2.96±0.55, and 2.93±0.56 Gy/frac with prediction RMSEs compared to clinical decision of 0.99, 0.92, and 0.94 Gy/frac respectively.
Conclusion: This work demonstrates that QRL along with DNNs can be a viable tool for RT clinical decision support for response-adapted radiotherapy. To increase accuracy, a greater number of qubits will be used in the future and independent validation will be conducted.