BookmarkSubscribeRSS Feed

The Tail of 3 Models – The Story of Goodness of Fit with Binary Classification

Started ‎10-01-2013 by
Modified ‎10-06-2015 by
Views 1,948


Before you select the best model based on your favorite goodness of fit statistic – Mean Squared Error, Gini, K-S, AUC, or misclassification rate – STOP!  Model performance metrics are not a one size fits all measure.  As an analyst, selecting the right performance metric might mean the difference between having an exceptionally good result, and having no result.   


The classic example:  There is only a 3% prevalence of the event of interest in my data. I can build a model that is 97% accurate (3% error rate) that NEVER detects the event of interest!    In fact, I don’t even need to build a model to get this result – I can just guess “No” 100% of the time. 

Much of the time, the reason you are modeling a binary outcome is that you are faced with limited resources and you are trying to determine what efforts to focus on to maximize returns.   While these classic performance metrics may allow an understanding of overall fit, they are not very helpful in discerning which model provides the best performance at a certain depth in the list.  Rather than looking at the overall performance of the model, you need to look at the tails. 

In the ‘Cumulative % Captured Response Chart’ below, the performance of 3 models developed in Enterprise Miner is shown.  If I can only go after 2% of the population, I would select the Decision Tree Model where I can capture 60% of the response.  Conversely, if I want to drop the 20% least likely cases, I should select the SVM.    Just eyeballing this chart, I might think that the regression model gives the best result – and it has the highest ROC and Gini Coefficient.
gof.jpg

In this modeling scenario, I focused my efforts with finding the model that performs the best on the tails.  Other approaches may need to be considered depending on your objective:  minimizing false positives in fraud detection, minimizing false negatives in health care, or identification of the most stable model.   As always, the business objective needs to be considered every step along the way.

Version history
Last update:
‎10-06-2015 03:08 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags