Click here to


Are you sure ?

Yes, do it No, cancel

Unboxing Artificial Intelligence "black-Box" Models - A Novel Heuristic

S Weppler1,2*, H Quon1,2, N Harjai1, C Beers1, L Van Dyke2, C Kirkby1,2,3, C Schinkel1,2, W Smith1,2, (1) University of Calgary, Calgary, AB, CA, (2) Tom Baker Cancer Centre, Calgary, AB, CA, (3) Jack Ady Cancer Centre, Lethbridge, AB, CA.


(Sunday, 7/12/2020)   [Eastern Time (GMT-4)]

Room: AAPM ePoster Library

Purpose: Artificial intelligence (AI) models capture complex associations in a dataset. However, it can be difficult to translate these associations into useful clinical practice guidelines. We introduce a novel heuristic to simplify AI models used for patient selection and identifying QA violations into intuitively understandable sets of inequalities, emulating the “black-box” process.

Methods: The heuristic first simulates new data based on existing data points. It then identifies when incremental changes in input variables cause model predictions to transition from one classification to another. Transition values are aggregated across simulated data. The most common transition values provide candidate inequality cutoff values. Final simplified criteria are based on the combination of candidate cutoff values maximizing sensitivity and specificity, producing requirements of the form: If x?1=c1 OR x?2=c2 OR… then “classification”. Two applications are investigated: 1.) patient selection for head and neck adaptive radiation therapy (ART) using random forests and 2.) workflow-specific refinement of deformable image registration (DIR) QA criteria using lasso logistic regression. We compare the performance of simplified criteria with full models and criteria from the literature.

Results: In both applications, simplified criteria consisted of at most 8 inequality items, compared to the complex architecture of full AI models (e.g., random forest ensembles of 1000 regression trees). The performance of heuristically simplified ART patient selection criteria had sensitivity >0.80 and specificity ~0.70 across multiple dosimetric endpoints, comparable to full AI models. For DIR QA, slight losses in criteria performance were observed (sensitivity=0.80, specificity=0.57) relative to a full lasso model (sensitivity=0.72, specificity=0.88). However, AI-based simple criteria outperformed a naïve application of standard QA recommendations (sensitivity=0.57, specificity=0.68). Sensitivity was prioritized in both applications.

Conclusion: This heuristic technique is capable of simplifying classification criteria for the applications investigated and may be valuable in scenarios where full AI models cannot be integrated with the clinic.

Funding Support, Disclosures, and Conflict of Interest: This work was supported by the Natural Sciences and Engineering Research Council of Canada. The study team has no relevant financial disclosures or conflicts of interest to declare.


Modeling, Feature Selection, Conformal Radiotherapy


IM/TH- Mathematical/Statistical Foundational Skills: Machine Learning

Contact Email