Morning all, This is my first foray into the world of predictive modelling. I'm attempting to predict the prevalence of an event that occurs in roughly 1% of my dataset - 1400 events in 118000 dataset. 'L' being 'Large customers' in a known data set of small/medium/large customers. I've set this as a binary target and set ordering to ascending so that it attempts to predict the rare event. 1. Firstly is this the correct way to approach things, or should I be manipulating the prior probabilities and oversampling instead? 2. The difficulty I'm having is that the models are trying to be 'too predictive', for example am I right in thinking the attached matrix suggests I would 'lose' 206 large customers to every 34 I can correctly predict correctly? My preference would be, if I were providing my sales guys a list of clients, to improve on chance at 1 in 100 being large to something like 7 in 100 whilst 'losing' the smallest number of potential leads. I hope this makes sense, if you need any more clarity please let me know. I'm using Enterprise Miner 14.1 and I've been looking at mainly Decision Trees and Regression models. F
... View more