Room: AAPM ePoster Library
Radiomics involves analyzing features extracted from images of the tumour and then building mathematical models to infer prognosis information. With the number of regression models available, a clinician may need to know the efficacy and the interpretability of each. This study aims to compare the efficacy of commonly used regression models.
A dataset consisting of the CT images of 422 non-small-cell lung carcinoma patients was studied. For each tumour mask, 107 features were extracted using PyRadiomics, a Python library used for analyzing radiomics features. Then the features were used in two binary classifications: surviving over 1 year (or not) and over 3 years (or not). Censored patients were excluded based on the registered survival time.
The dataset was divided into training and test sets for the analysis. Heavy skewness in some features was mitigated through a logarithmic or a Box-Cox transform. Logistic regression, (unboosted) decision tree classifier, random forest classifier, XGBoost decision tree classifier and support vector machine (SVM) were investigated in this study.
For the 1-year survival classification problem, the logistic ridge regression yielded a 0.619 accuracy, compared to a 0.659 accuracy for the decision tree, 0.659 for the random forest, 0.675 for the XGBoost decision tree, and 0.619 for the SVM.
For the 3-year survival classification problem, the logistic ridge regression yielded a 0.667 accuracy, compared to a 0.675 accuracy for the decision tree, 0.675 for the random forest, 0.643 for the XGBoost decision tree, and 0.675 for the SVM.
Although there is some variation in the performance of each model, the variation is limited. This underlines the importance of feature engineering rather than relying on the model to boost performance. Since all regression models have comparable performance, interpretability should also be a consideration for opting a specific model.