Click here to


Are you sure ?

Yes, do it No, cancel

Does the Choice of Deep Learning Architecture Matter? Experience From a Radiotherapy Case Study

S Gay1*, A Jhingran1, B Anderson1, L Zhang1, D Rhee1, C Nguyen1, T Netherton1, J Yang1, K Brock1, H Simonds2, A Klopp1, B Beadle3, K Kisling4, L Court1, C Cardenas1, (1) The University of Texas MD Anderson Cancer Center, Houston, TX,(2) Stellenbosch University, Stellenbosch, ,ZA, (3) Stanford University, Stanford, CA, (4) UC San Diego, La Jolla, CA


(Sunday, 7/12/2020)   [Eastern Time (GMT-4)]

Room: AAPM ePoster Library

Purpose: To extensively evaluate deep learning architectures commonly used for image segmentation and identify those that are more robust to initial hyperparameter selection.

Methods: A total of 1295 unique autosegmentation models were trained to delineate radiotherapy field apertures using DRRs (over 23,000 computing hours in total). Four-field-box female pelvis cases were obtained retrospectively and split into datasets of 229-26-55 train, cross-validation, and test, respectively. Each case contained AP, PA, and lateral DRRs. Five commonly used architectures (DeepLabv3+, D-LinkNet, VGG-19+decoder, U-Net, and Res-U-Net) were trained with three learning rates and seven intensity normalization schemes. For U-Net and Res-U-Net, additional hyperparameters were tested, including levels of network depth, kernel sizes, and number of first-layer features. Training took place on single Tesla-V100 GPUs using Adam optimizer and early stopping regularization. To evaluate inferences, both Dice similarity coefficient (DSC) and a composite score consisting of the average of multiple overlap and distance metrics were used. The top-performing model for each architecture was identified from the highest DSC and composite scores; relative sensitivity to initial hyperparameters was compared using the 25th percentile scores for each model.

Results: Learning rate emerged as the most important factor in model convergence, with all models except VGG-19 achieving similar top DSC and composite scores when learning rate was 0.001 or less. Following this, Z-score intensity normalization was key to best model performance. Using 25-th percentile scores, Residual U-Net and Deeplabv3+ emerged as the most robust models.

Conclusion: While most commonly used image segmentation architectures can approach acceptable convergence, the choice of hyperparameters, particularly too-high learning rate, greatly affects model training. If training time permits, Residual U-Net is an excellent architecture, however in time-constrained settings or when a slightly higher margin of error is acceptable, DeepLabv3+ provides similarly good results.

Funding Support, Disclosures, and Conflict of Interest: This work was partially funded by NCI (UH3CA202665) Disclosures: Our research group receives additional funding from Varian Medical Systems.


Not Applicable / None Entered.


Not Applicable / None Entered.

Contact Email