Room: Exhibit Hall | Forum 2
Purpose: To analyze and quantify the relative feature importance among an array of complexity metrics and plan characteristics for predicting patient-specific quality assurance (QA) outcomes using a machine learning algorithm.
Methods: Feature importance was estimated by averaging results from 50 forests of extremely randomized decision trees, where the number of trees equaled the number of samples in each forest. Using in-house software, 255 features were computed or extracted from VMAT plans, along with gamma passing rates (GPRs) from corresponding QA measurements from 500 patients previously treated at our institution. In general, numerical features were derived from MLC positions and segment monitor units (MUs), such as modulation complexity score, average leaf travel per degree of gantry rotation, and average aperture area. Categorical features included treatment site, treatment machine, and whether a flattening filter was used.
Results: 111 features had relative importances greater than the average importance of all features. The most important numerical features were MU factor (ranked 1st), maximum small-aperture score at 40 mm (3rd), and average weighted leaf travel (5th), while the most important categorical features included the treatment site of ‘Lung’ (2nd) and treatment machine of ‘Versa’ (6th). Complexity features were not strongly correlated with GPRs, with the average leaf gap metric having the highest Pearson correlation coefficient (R = 0.40, p < 0.001) with GPRs, specifically at the criterion of 3%/1mm with local normalization. Principal component analysis revealed 30 principal components retained 95% of the overall dataset variance.
Conclusion: Relative importances of 255 complexity metrics for predicting GPRs were assessed on a dataset of 500 VMAT plans and QA measurements. The impact of feature selection will be assessed during development of a machine learning model for predicting GPRs, including whether separate models for categorical features are more robust than a composite model.
Feature Selection, Quality Assurance, Radiation Therapy
TH- Dataset analysis/biomathematics: Machine learning techniques