Click here to


Are you sure ?

Yes, do it No, cancel

Development and Validation of a Web-Crawler-Based Medical Records Information Aggregation Tool

H Liu*, Y Huang, Y Pu, H Wu, Y Zhang#, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital & Institute, Beijing, BeijingCN,#corresponding author: Yibao Zhang; email:


(Sunday, 7/12/2020)   [Eastern Time (GMT-4)]

Room: AAPM ePoster Library

The medical applications of artificial intelligence have been strongly dependent on the data aggregation and organization. Conventional manual collection is labor-intensive and time-consuming. This work aims to develop and validate a Web-Crawler-based medical records information aggregation tool for effective data mining from existing electronic information systems.

Based on Selenium framework and Python programming language, a Web-Crawler-based medical records information aggregation tool was designed, which was validated under two illustrative scenarios: 1. To identify radiation pneumonitis (RP) cases from Hospital Information System (HIS), as an application example of quick data search; 2. To summarize an organized table combining desired data from various examination reports, to test the application of facilitating clinical workflow. Automated and manual methods were compared in terms of efficiency and accuracy.

The automated tool showed superior efficiency and accuracy than manual method. For the first scenario, automated method identified 110 RP cases out of 3541 patients in about 54 seconds per patient based on a Raspberry Pi 4B, without any human interference. Manual methods identified the same group of RP cases but took about 90 seconds per patient. It took longer to confirm a non-RP case because more data need to be excluded to avoid false-negative, suggesting even greater advantage of automated method in searching small-probability events, especially from huge patient volume. For the other scenario, automated and manual methods needed about 10 or 75 seconds respectively for each patient. Automated method also avoided typos that were frequently observed in manual report filling.

A Web-Crawler-based medical records information aggregation tool has been successfully developed. The superior efficiency and accuracy of auto-aggregation has been validated based on specific clinical scenarios. With the advantage of cross-platform and easy-to-extend, this application could improve radiologists' and physicists' productivity in their clinical and research practice.

Funding Support, Disclosures, and Conflict of Interest: Capital's Funds for Health Improvement and Research[2018-4-1027]; Fundamental Research Funds for the Central Universities/Peking University Clinical Medicine Plus X - Young Scholars Project(PKU2020LCXQ019); National Key R&D Program of China(2019YFF01014405); Ministry of Education Science and Technology Development Center[2018A01019]; National Natural Science Foundation of China[11505012,11905150].Corresponding author: Yibao Zhang,


Computer Software, Data Acquisition, Statistical Analysis


IM- Dataset Analysis/Biomathematics: Informatics

Contact Email