Click here to


Are you sure ?

Yes, do it No, cancel

Simulation of Realistic Organ-At-Risk Delineation Variability in Head and Neck Radiation Therapy

W Choi1*, E Aliotta2 , H Nourzadeh3 , J Siebers4 , (1) University of Virginia, Charlottesville, VA, (2) University of Virginia, Charlottesville, VA, (3) University of Virginia Health Systems, Charlottesville, VA, (4) University of Virginia Health System, Charlottesville, VA


(Sunday, 7/14/2019) 4:00 PM - 5:00 PM

Room: 225BCD

Purpose: To simulate realistic manual delineation (MD) organ-at-risk (OAR) delineation variability (DV) the purpose of quantifying DV’s dosimetric impact.

Methods: Fourteen independent MD head-and-neck OAR structure sets (SS) were obtained from the ESTRO Falcon group. Seven OARs were available (BrainStem, Esophagus, OralCavity, Parotid_L, Parotid_R, SpinalCord, and Thyroid). A consensus MD SS was generated by the simultaneous truth and performance level estimation (STAPLE) method. MD DV was evaluated with respect to the STAPLE SS using the Dice coefficient and Hausdorff distance (HD) geometric similarity metrics. DVs were simulated using auto-delineation (AD)
methods: an average surface of standard deviation (ASSD) method, GrowCut segmentation, and a random walker (RW) segmentation. Each OAR AD was repeated five times with a different seed or variability level. Dice and HD were computed for each OAR AD with respect to the STAPLE SS. Dosimetric analysis was achieved by intercomparing dose-volume histograms (DVH) from a plan developed with a reference MD SS with DVHs for each MD and AD. DVH confidence bands are reported for MD and each AD method.

Results: The MD Dice was 0.7±0.2 (μ±σ). AD Dice values (ASSD, GrowCut, and RW) were 0.5±0.2, 0.7±0.2, and 0.8±0.1, respectively. HDs were 35.4±45.2, 27.3±19.1, 29.3±19.9, and 14.6±10.3. The simulated DV increased with increasing the seed standard deviations or variability level. The dosimetric effect was largest for MD DVs (larger OAR DVH confidence intervals and larger HD), even though the MD Dice was greater than the ASSD and GrowCut Dice values. GrowCut DV resulted in less dosimetric variation than RW, unlike the geometric indices.

Conclusion: We developed a framework to simulate DVs and demonstrated its feasibility. ADs were able to simulate different magnitudes of DVs, but did not replicate the dosimetric consequences of human delineation variability. The correlation between geometric similarity metrics and dosimetric consequences of DV is poor. 


Not Applicable / None Entered.


Not Applicable / None Entered.

Contact Email