05-19-2017 10:13 PM - edited 05-19-2017 10:15 PM
As part of my MS in Analytics program, I had an opportunity to discuss about forecasting application of Machine Learning Methods in Healthcare. I wanted to share it with sas community.
Sepsis: Per Anonymous (2016) “Sepsis is a potentially life threatening complication of an infection. Sepsis occurs when chemicals released into the bloodstream to fight the infection trigger inflammatory responses throughout the body”.
This article is about medical procedures after surgeries in hospital operating rooms.
Authors were interested in predicting postoperative complications prior to surgery by using pre-operative data.
Data: (variable Types, data sources)
Authors used all of the available pre-operative clinical and administrative data from multiple databases and health systems between 2000-2010. All patients between the ages of 18 and older were used for the study. Altogether authors processed “285 demographic (e.g., Age, Gender, Race etc.), socio-economic (e.g., primary insurance, zip code etc.) administrative (day of admission, month of admission, weekend admission, etc.) clinical (e.g., major diagnosis category, Myocardial Infarction etc.) pharmacy (e.g., diuretics, steroids etc.) and laboratory variables (e.g., Hematocrit, Reference serum Creatinine)
Machine Learning Methods Used:
Authors compared 4 different models including Naïve Bayes, generalized additive model, logistic regression, and support vector machines.
Naïve Bayes: Used this method to have the predictive models learn the input data distribution.
Logistic Regression and General Additive Model (GAM): Used these methods to direct map from input data to response labels. With Logistic Models, they were able to understand if the predicted risk was increasing or decreasing. With GAM, they were able to estimate non-linear risk functions for continuous variables e.g., age, hematocrit, and hemoglobin.
Support Vector Machine (SVM): Used SVM to separate decision boundary in the input feature space.
The results in the article show that both GAM and Logistic Regression had better performance and model fit as compared against Naïve Bayes and SVM with Area Under the Curve (AUC) above 0.80 and between 0.022 and 0.03 higher for predicting Acute Kidney Injury and Severe Sepsis respectively. GAM addressed the non-linearity of continuous clinical variables. Risk patterns for hemoglobin, plasma hematocrit showed that non-linear models effectively show risk variation compared to linear models.
Thottakkara P, Ozrazgat-Baslanti T, Hupf BB, Rashidi P, Pardalos P, Momcilovic P, et al. (2016) Application of Machine Learning Techniques to High Dimensional Clinical Data to Forecast Postoperative Complications. PLOS ONE 11(5): e0155705. doi: 10.1371/journal.pone.0155705. 1-19.
Anonymous (2016) Sepsis