Room: Exhibit Hall | Forum 8
Purpose: Automated PET segmentation methods are predominantly validated in lung or head/neck cancers. Here, the performance and robustness of ten segmentation methods is assessed in lymphoma, a cancer presenting segmentation difficulties due to extreme heterogeneity in tumor size, contrast, and location.
Methods: Tumors on ¹�F-FDG PET/CT scans of 41 lymphoma patients were segmented by a radiologist. Algorithms implemented included thresholding (40% and 50% SUVmax), level-sets, clustering (fuzzy c-means, spatial distance weighted fuzzy c-means, k-means++), adaptive region-growing, and a combination level-sets method initialized by fuzzy c-mean output. Two patch-based 3D convolutional neural networks (CNNs) were trained and tested using 5-fold cross validation: a dual-pathway CNN (DeepMedic) and a U-net. Methods were assessed using lesion Dice coefficients and modified Hausdorff distance (MHD), ranked by median performance. Ties were given to methods not significantly different (p>0.01 by Wilcoxon signed-rank test for clustered data). Robustness of each method with respect to tumor volume, tumor-to-background ratio (TBR) and initial input was assessed using Wilcoxon rank tests.
Results: In total, 495 tumors with median TBR and volume of 4.7 and 1.8 cm³, respectively, were available for analysis. DeepMedic, level-sets, and 50% SUVmax were ranked first in Dice coefficient. DeepMedic was ranked first in MHD with no ties. DeepMedic produced a median Dice of 0.63 (lower-upper quartiles: 0.45-0.75) and median MHD of 0.24 cm (0.16 cm-0.37 cm). 50% SUVmax and adaptive region-growing were robust to lesion volume and TBR, with all other methods showing significant decreases in Dice and/or increases in MHD in tumors below the median TBR or volume. All methods except adaptive region-growing showed significant dependence on initial input mask.
Conclusion: A dual-pathway 3D CNN achieved the highest overall segmentation performance in lymphoma patients, with large variability in performance across all methods. Most methods showed poorer performance in the small, low-contrast tumors common in lymphoma.