Click here to


Are you sure ?

Yes, do it No, cancel

Stratification of Lung Cancer Risk From Personal Health Data

GR Hart*, DA Roffman , R Decker , J Deng , Yale university School of Medicine, New Haven, CT


(Monday, 7/30/2018) 4:30 PM - 6:00 PM

Room: Davidson Ballroom A

Purpose: Like all cancers, the five-year survival rate for lung cancer is greatly improved when cancer is detected early, 55% versus the overall survival rate of 18%. Improved early detection can save many lives. The current standard for detecting lung cancer is low-dose CT scan (LDCT). With the expense, time, and risk of LDCT it is only recommended to the population at the highest risk and is plagued by a high false positive rate that leads to follow-up tests with more expense, time, and risk. Improved identification of those that should be screened would greatly improve the effectiveness of LDCT and early cancer detection.

Methods: Thirteen parameters representative of the data found in electronic medical records (gender, age, race, Hispanic ethnicity, vigorous exercise habits, smoking status, BMI, hypertension, diabetic status, emphysema, asthma, heart disease, and history of stroke) along with lung cancer diagnosis were extracted from National Health Interview Survey (NHIS) data. This data was collected from 1997–2015 (except 2004) and gave 648 cancer and 488,418 non-cancer cases. This data was split and used to train and validate a neural network (NN) to predict an individual’s cancer risk and stratify them into low, medium, and high risk.

Results: The training dataset had sensitivity 79.8% (95% CI, 75.9%-83.6%), specificity 79.9% (79.8%-80.1%), and AUC 0.86 (0.84-0.88). The validation dataset had sensitivity 75.3% (68.9%-81.6%), specificity 80.6% (80.3%-80.8%), and AUC 0.88 (0.84-0.91). We classified 18.2% of those with cancer as high risk, 80.8% as medium risk, and 1.82% as low risk. For those without cancer we classified 1.16%, 86.8%, and 12.1% as high, medium, and low risk respectively.

Conclusion: Our results indicate that the use of an NN based on personal health information gives high specificity and modest sensitivity for lung cancer detection, offering a cost-effective and non-invasive clinical tool for risk stratification.


Pattern Recognition, Computer Software, Statistical Analysis


Not Applicable / None Entered.

Contact Email