Does the Choice of Deep Learning Architecture Matter? Experience From a Radiotherapy Case Study

S Gay¹*, A Jhingran¹, B Anderson¹, L Zhang¹, D Rhee¹, C Nguyen¹, T Netherton¹, J Yang¹, K Brock¹, H Simonds², A Klopp¹, B Beadle³, K Kisling⁴, L Court¹, C Cardenas¹, (1) The University of Texas MD Anderson Cancer Center, Houston, TX,(2) Stellenbosch University, Stellenbosch, ,ZA, (3) Stanford University, Stanford, CA, (4) UC San Diego, La Jolla, CA

S Gay

Presentations

(Sunday, 7/12/2020) [Eastern Time (GMT-4)]

Room: AAPM ePoster Library

Purpose: To extensively evaluate deep learning architectures commonly used for image segmentation and identify those that are more robust to initial hyperparameter selection.

Methods: A total of 1295 unique autosegmentation models were trained to delineate radiotherapy field apertures using DRRs (over 23,000 computing hours in total). Four-field-box female pelvis cases were obtained retrospectively and split into datasets of 229-26-55 train, cross-validation, and test, respectively. Each case contained AP, PA, and lateral DRRs. Five commonly used architectures (DeepLabv3+, D-LinkNet, VGG-19+decoder, U-Net, and Res-U-Net) were trained with three learning rates and seven intensity normalization schemes. For U-Net and Res-U-Net, additional hyperparameters were tested, including levels of network depth, kernel sizes, and number of first-layer features. Training took place on single Tesla-V100 GPUs using Adam optimizer and early stopping regularization. To evaluate inferences, both Dice similarity coefficient (DSC) and a composite score consisting of the average of multiple overlap and distance metrics were used. The top-performing model for each architecture was identified from the highest DSC and composite scores; relative sensitivity to initial hyperparameters was compared using the 25th percentile scores for each model.

Results: Learning rate emerged as the most important factor in model convergence, with all models except VGG-19 achieving similar top DSC and composite scores when learning rate was 0.001 or less. Following this, Z-score intensity normalization was key to best model performance. Using 25-th percentile scores, Residual U-Net and Deeplabv3+ emerged as the most robust models.

Conclusion: While most commonly used image segmentation architectures can approach acceptable convergence, the choice of hyperparameters, particularly too-high learning rate, greatly affects model training. If training time permits, Residual U-Net is an excellent architecture, however in time-constrained settings or when a slightly higher margin of error is acceptable, DeepLabv3+ provides similarly good results.

Funding Support, Disclosures, and Conflict of Interest: This work was partially funded by NCI (UH3CA202665) Disclosures: Our research group receives additional funding from Varian Medical Systems.

Keywords

Not Applicable / None Entered.

Taxonomy

Not Applicable / None Entered.

Contact Email

Are you sure ?

Does the Choice of Deep Learning Architecture Matter? Experience From a Radiotherapy Case Study

Presentations

Additional Links