An off-topic spot to chat about your musings of the day

Machine Learning Applications of Forecasting in Healthcare

Occasional Contributor
Posts: 16

Machine Learning Applications of Forecasting in Healthcare

[ Edited ]

As part of my MS in Analytics program, I had an opportunity to discuss about forecasting application of Machine Learning Methods in Healthcare. I wanted to share it with sas community.




Sepsis: Per Anonymous (2016) “Sepsis is a potentially life threatening complication of an infection. Sepsis occurs when chemicals released into the bloodstream to fight the infection trigger inflammatory responses throughout the body”.


This article is about medical procedures after surgeries in hospital operating rooms.


Business Problem:


Authors were interested in predicting postoperative complications prior to surgery by using pre-operative data.


Data: (variable Types, data sources)


Authors used all of the available pre-operative clinical and administrative data from multiple databases and health systems between 2000-2010. All patients between the ages of 18 and older were used for the study. Altogether authors processed “285 demographic (e.g., Age, Gender, Race etc.), socio-economic (e.g., primary insurance, zip code etc.) administrative (day of admission, month of admission, weekend admission, etc.) clinical (e.g., major diagnosis category, Myocardial Infarction etc.) pharmacy (e.g., diuretics, steroids etc.) and laboratory variables (e.g., Hematocrit, Reference serum Creatinine)


Machine Learning Methods Used:


Authors compared 4 different models including Naïve Bayes, generalized additive model, logistic regression, and support vector machines.




Naïve Bayes: Used this method to have the predictive models learn the input data distribution.

Logistic Regression and General Additive Model (GAM): Used these methods to direct map from input data to response labels. With Logistic Models, they were able to understand if the predicted risk was increasing or decreasing. With GAM, they were able to estimate non-linear risk functions for continuous variables e.g., age, hematocrit, and hemoglobin.

Support Vector Machine (SVM): Used SVM to separate decision boundary in the input feature space.




The results in the article show that both GAM and Logistic Regression had better performance and model fit as compared against Naïve Bayes and SVM with Area Under the Curve (AUC) above 0.80 and between 0.022 and 0.03 higher for predicting Acute Kidney Injury and Severe Sepsis respectively. GAM addressed the non-linearity of continuous clinical variables. Risk patterns for hemoglobin, plasma hematocrit showed that non-linear models effectively show risk variation compared to linear models.



Thottakkara P, Ozrazgat-Baslanti T, Hupf BB, Rashidi P, Pardalos P, Momcilovic P, et al. (2016) Application of Machine Learning Techniques to High Dimensional Clinical Data to Forecast Postoperative Complications. PLOS ONE 11(5): e0155705. doi: 10.1371/journal.pone.0155705. 1-19.


Machine Learning Applications-Healthcare


Anonymous (2016) Sepsis


Ask a Question
Discussion stats
  • 0 replies
  • 1 like
  • 1 in conversation