Morning all,
This is my first foray into the world of predictive modelling.
I'm attempting to predict the prevalence of an event that occurs in roughly 1% of my dataset - 1400 events in 118000 dataset. 'L' being 'Large customers' in a known data set of small/medium/large customers.
I've set this as a binary target and set ordering to ascending so that it attempts to predict the rare event.
1. Firstly is this the correct way to approach things, or should I be manipulating the prior probabilities and oversampling instead?
2. The difficulty I'm having is that the models are trying to be 'too predictive', for example am I right in thinking the attached matrix suggests I would 'lose' 206 large customers to every 34 I can correctly predict correctly?
My preference would be, if I were providing my sales guys a list of clients, to improve on chance at 1 in 100 being large to something like 7 in 100 whilst 'losing' the smallest number of potential leads.
I hope this makes sense, if you need any more clarity please let me know.
I'm using Enterprise Miner 14.1 and I've been looking at mainly Decision Trees and Regression models.
F
Misclassification tables can be very misleading in rare event scenarios. Those tables are typically built using either the default target profile (most likely outcome is the prediction) or a weighted outcome based on decision weights which you have entered (most valuable outcome is the one predicted). In practice, you should look at the choosing a threshold for your decision after looking at how the model performs taking into consideration the different types of error you might make (e.g. is it more problematic to predict an 'event' as a 'non-event' or vice-versa?). Your choice of the 'best' cutoff can change depending on your goal and the risk/reward associated with each outcome. There is a good thread discussing some of your options at
Hope this helps!
Doug
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.