Click here to


Are you sure ?

Yes, do it No, cancel

Quantitative Versus Qualitative and Dosimetric Evaluation of Automated Segmentations

J Pursley*, G Maquilan, G Sharp, Massachusetts General Hospital and Harvard Medical School, Boston, MA


(Sunday, 7/12/2020)   [Eastern Time (GMT-4)]

Room: AAPM ePoster Library

Purpose: As automated anatomic segmentations move into clinical implementation, questions remain about how to evaluate the quality of auto-segmented contours. Many studies use purely quantitative metrics such as measures of region overlap or surface distance. This work explores the use of a qualitative evaluation system for rating the clinical acceptability of auto-segmented contours and analyzes changes in dosimetric quantities for auto-segmented contours. A strong correlation between quantitative metrics and qualitative scores or dosimetric changes would establish a scientific basis for use of quantitative metrics in the evaluation of auto-segmentations.

Methods: The qualitative system contained 5 scores ranging from “clinically acceptable” to “completely unacceptable” which were evaluated by multiple reviewers for auto-segmented structures generated from user-defined MIM Software deformable atlases. Four quantitative metrics (Hausdorff distance, Mean Distance to Agreement, DICE and Jaccard coefficients) were calculated by comparing to physician contours in three disease sites: prostate, pancreas, and head-and-neck cancer. Dose-volume histograms were generated for auto-segmented contours from clinical treatment plans optimized based on physician contours.

Results: Overall results were varied. For bony-feature anatomy such as the femurs, mandible, and spinal canal, quantitative metrics failed to identify discrepancies that reviewers considered clinically significant and absolute Pearson correlations ranged from R=0-0.25. For soft-tissue organs, quantitative metrics tended to correlate with qualitative scores although the degree of correlation varied with the organ size, from R=0.2-0.7 for small organs like the parotids to R=0.5-0.9 for large organs like the bladder. The change in organ mean and maximum dose, both in absolute dose and percentage, showed low correlation with either quantitative metrics (R=0-0.6) or qualitative scores (R=0-0.5).

Conclusion: Results indicate that while quantitative metrics do have merit in evaluation of auto-segmentation, their applicability may vary with organ properties. Further research is needed to investigate the dosimetric consequences of modifications to anatomic contours.


Segmentation, Quality Assurance


IM/TH- image Segmentation: CT

Contact Email