Room: Karl Dean Ballroom C
Purpose: Data cleaning consumes about 80% of the time spent on data analysis for clinical research projects especially in the era of big data where large volumes of data are being generated. We report an initial effort towards automated data cleaning for head and neck cancer patients using deep learning: the standardization of organ labeling in radiation therapy.
Methods: Critical organs are often labeled inconsistently at different institutions (sometimes even within the same institution across time). We developed a 3D convolutional neural network (CNN) to automatically identify the critical organs in CT images and label them with the standardized nomenclature recommended AAPM Task Group 263. We used organ masks and CT images as raw data. Composite organ masks were generated by superimposing the organ masks onto a single image corresponding to each slice. The composite masks along with the corresponding CT images and individual organ masks were fed as volumetric data into 3 separate channels of the model. The model effectively identified high-dimensional spatial, structural and intensity features for each organ relative to the neighboring organs. Finally, once the organs had been identified standardized labels were assigned to it.
Results: The model was successfully tested on two head and neck datasets. First, we used 54 patients with 9 common organs present. The model achieved an organ identification accuracy of 100%. Next, we tested the model of 218 previously treated patients and used a comprehensive list of 29 organs. Each patient had a different subset of 29 organs, posing a big challenge for the model (lens left/right, lips had fewer than 25 samples). Yet the model still reported an accuracy around 96%.
Conclusion: Despite the lack of samples for some critical organs, the model accurately identified each organ. Next, we plan to address the data-imbalance problem to obtain a 100% accuracy.