Room: AAPM ePoster Library
Purpose: It is desirable to personalize treatment. On the other hand, there is a risk of implicit bias from caregivers. Unlike conventional methods, to understand the contributing factors in management differentiation we hypothesize, develop and investigate novel language processing techniques to (1) detect the existence of race-related differences and (2) identify the prominent contributing factors.
Methods: Under IRB approved protocol, 4209 clinical notes with corresponding race information were collected for prostate cancer treatment, including 3434 clinical notes from White, 173 Asian, 71 Black, 1 American Indian and 530 from undisclosed ethnicity groups. The whole data set was split into training, validation and testing sets with partition ratio 0.7:0.15:0.15. We classified between black race vs all others. Class weighting was set to 60:1 in the cost function to emphasize the black samples and balance class. Preprocessing include case conversion and tokenization on both sentence and word level. Two deep networks were investigated: model A contains a core representation model with variational information bottleneck (VIB) module as regularization, followed by a support vector machine (SVM); and model B utilizes hierarchical attention structure, bidirectional gated recurrent unit (GRU) module and attention layers on both word and sentence level. Major contributing factors are probed and directly reported for each model.
Results: We has achieved classification performance with universally >0.7 in accuracy/precision/recall for training, validation, and testing sets, and across the two classification methods. Model A and B identified comorbidity factors such as diabetes and medications, respectively, as major contributing factors for different race groups.
Conclusion: This pilot study alludes to the promise that document embedding models in conjunction with deep network may reveal patterns without presumptions. Preliminary results show correlation between comorbidity, mediation and race, which agrees with clinical insight. These automatic findings indicate clinical rationales as the driving force for treatment difference.
Classifier Design, Prostate Therapy
TH- Dataset Analysis/Biomathematics: Machine learning techniques