Hi,
I have build an attrition model and I am evaluating its perfomance. I have sorted the probabilities from high to low, dividing the customers into ten
equally-sized groups called “deciles”, such that ten percent of the customer base is con-tained in each decile, and observing model performance in terms of attrition rate by decile. Using the code below..
proc rank data=OUT groups=10 out=OUT_DECILE descending;
var P; ranks decile;
run;
data OUT_DECILE;
set OUT_DECILE;
decile=decile+1;
run;
proc means data=OUT_DECILE n mean sum;
var LAPSE;
class decile;
run;
I have run it first on the training dataset used to build model and I get this below. Then I have scored the validation dataset, and rank the probabilities again and I get this below. Shall I not get roughly the same % per decile? Is my model not performing well then? I have used the gain chart to compare Validation and Trainig but they looked fine? Your help woul be much appreciated . Many Thanks
Analysis Variable - LAPSE : Training Sample | |||
Rank for Variable | N Obs | Decile Mean | Overall Mean |
pred | |||
1 | 20,986 | 79% | 30% |
2 | 21,014 | 70% | 30% |
3 | 20,999 | 38% | 30% |
4 | 21,041 | 29% | 30% |
5 | 17,839 | 25% | 30% |
6 | 24,168 | 22% | 30% |
7 | 20,952 | 20% | 30% |
8 | 21,013 | 12% | 30% |
9 | 20,998 | 7% | 30% |
10 | 20,990 | 5% | 30% |
Analysis Variable : LAPSE : Validation Sample | |||
Rank for Variable | N Obs | Decile Mean | Overall Mean |
pred | |||
1 | 9,034 | 100% | 21% |
2 | 9,034 | 100% | 21% |
3 | 9,034 | 13% | 21% |
4 | 8,968 | 0% | 21% |
5 | 10,822 | 0% | 21% |
6 | 7,312 | 0% | 21% |
7 | 9,033 | 0% | 21% |
8 | 9,014 | 0% | 21% |
9 | 9,055 | 0% | 21% |
10 | 9,034 | 0% | 21% |
I hve attached the gain chart. Many Thanks
Knowing nothing else, it seems to me that your training model is not generalizing well to the validation set. Which is usually a sign of overfitting.
What tool are you using to create the initial model, and what technique?
Hi,
I am using SAS and Logistic Regression. But the gain chart is showing that the model is robust. Please See attached
Many Thanks
Alice
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.