Hi,
I have build an attrition model and I am evaluating its perfomance. I have sorted the probabilities from high to low, dividing the customers into ten
equally-sized groups called “deciles”, such that ten percent of the customer base is con-tained in each decile, and observing model performance in terms of attrition rate by decile. Using the code below..
proc rank data=OUT groups=10 out=OUT_DECILE descending;
var P; ranks decile;
run;
data OUT_DECILE;
set OUT_DECILE;
decile=decile+1;
run;
proc means data=OUT_DECILE n mean sum;
var LAPSE;
class decile;
run;
I have run it first on the training dataset used to build model and I get this below. Then I have scored the validation dataset, and rank the probabilities again and I get this below. Shall I not get roughly the same % per decile? Is my model not performing well then? I have used the gain chart to compare Validation and Trainig but they looked fine? Your help woul be much appreciated . Many Thanks
Analysis Variable - LAPSE : Training Sample | |||
Rank for Variable | N Obs | Decile Mean | Overall Mean |
pred | |||
1 | 20,986 | 79% | 30% |
2 | 21,014 | 70% | 30% |
3 | 20,999 | 38% | 30% |
4 | 21,041 | 29% | 30% |
5 | 17,839 | 25% | 30% |
6 | 24,168 | 22% | 30% |
7 | 20,952 | 20% | 30% |
8 | 21,013 | 12% | 30% |
9 | 20,998 | 7% | 30% |
10 | 20,990 | 5% | 30% |
Analysis Variable : LAPSE : Validation Sample | |||
Rank for Variable | N Obs | Decile Mean | Overall Mean |
pred | |||
1 | 9,034 | 100% | 21% |
2 | 9,034 | 100% | 21% |
3 | 9,034 | 13% | 21% |
4 | 8,968 | 0% | 21% |
5 | 10,822 | 0% | 21% |
6 | 7,312 | 0% | 21% |
7 | 9,033 | 0% | 21% |
8 | 9,014 | 0% | 21% |
9 | 9,055 | 0% | 21% |
10 | 9,034 | 0% | 21% |
I hve attached the gain chart. Many Thanks
Knowing nothing else, it seems to me that your training model is not generalizing well to the validation set. Which is usually a sign of overfitting.
What tool are you using to create the initial model, and what technique?
Hi,
I am using SAS and Logistic Regression. But the gain chart is showing that the model is robust. Please See attached
Many Thanks
Alice
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.