Click here to


Are you sure ?

Yes, do it No, cancel

Data Collection for Radiation Oncology Big Data Initiatives: An Integrated Clinical Workflow Based Solution

R Kapoor1*, W Sleeman1, J Palta1, (1) Virginia Commonwealth University, Richmond, VA


(Sunday, 7/12/2020)   [Eastern Time (GMT-4)]

Room: AAPM ePoster Library

As part of the routine clinical practice, large amounts of information is entered into our electronic healthcare records (EMR), radiation oncology information systems (ROIS) and treatment planning systems (TPS). With the aim to extract data for big data initiatives, this information needs to be in a structured and discrete format but much of it is stored as free text, often in Word documents. Here we present a method to create Word templates with macros so that discrete data elements can be programmatically extracted for clinical data analysis.

We have developed macro-enabled Word template documents for our on-treatment visit (OTV) notes across seven disease sites in the ARIA treatment management system. ARIA stores clinical notes in flat-file data structure without descriptive file names but specific notes can be located by querying its backend MS-SQL database. Discrete data elements are extracted from the Word documents using the build-in macros and are exported in the JavaScript Object Notation (JSON) format. The JSON data is then loaded into a SQL-like DataFrame in Python on which data analysis can be directly applied.

From notes completed between Dec 2018 to Feb 2020, 3671 OTV notes for 1218 unique patients were programmatically extracted using this system. Unique patients per primary disease site was lung (n=263), prostate (n=254), breast (n=287), H&N (n=223). We were able to capture seven to nine disease site specific toxicity scores per patient and the total number of data elements collected from the OTV templates ranged 35 to 55 (median: 42 elements).

Data needs to be captured as part of the routine clinical workflow using structure clinical templates for meaningful big data repositories to be created. In addition, standard ontologies and data dictionaries needs to be created to enable sharing of the knowledge acquired from the collected datasets.


Not Applicable / None Entered.


IM/TH- Informatics: Data archiving - Therapy

Contact Email