01-17-2017 05:58 AM
I used several predictive models to in order to score the probability of an obsarvation to stop paying his bill.
I use logistic regression and forest and SVM '
The best chosen model for my population was theForest with AUC = 0.81 and misclssification rate = 0.058.
Do these results reflect good predictive ability of the model?
01-23-2017 02:08 PM
Depends on what the prediction would be without a model. If the proportion of observations with the most common target value in the data is near 1 - 0.058, then a misclassification rate of 0.058 is not good. On the other hand, if the proportion is around 1/2, then 0.058 is a great number.
I suspect AUC of 0.81 is good, because it is much larger than 0.5.
01-24-2017 06:16 AM
Adding to Padraic's great comments- rather than focusing on one number, you may also use the cumulative captured response values with different percentile thresholds to decide if the model is good enough.
Let's say you have a budget to take action for the top 5 percent of your population (send reminder sms, call from contact center etc). What would be the response rate of your model at the 5th percentile vs the overall event rate (random selection)? There might be cases where the model that has a lower ROC compared to the champion model will be performing better at the extreme percentiles. You may also compute the total loss (unpaid invoice) in the top buckets to justify the value of your model before deploying in production.