BookmarkSubscribeRSS Feed
Question
Fluorite | Level 6

Hi,

I have build an attrition model and I am evaluating its perfomance. I have sorted the probabilities from high to low, dividing the customers into ten
equally-sized groups called “deciles”, such that ten percent of the customer base is con-tained in each decile, and observing model performance in terms of attrition rate by decile. Using the code below..

proc rank data=OUT groups=10 out=OUT_DECILE descending;

var P; ranks decile;

run;

data OUT_DECILE;

set OUT_DECILE;

decile=decile+1;

run;

proc means data=OUT_DECILE n mean sum;

var LAPSE;

class decile;

run;

I have run it first on the training dataset used to build model and I get this below. Then I have scored the validation dataset, and rank the probabilities again and I get this below. Shall I not get roughly the same % per decile? Is my model not performing well then? I have used the gain chart to compare Validation and Trainig but they looked fine? Your help woul be much appreciated . Many Thanks

Analysis Variable  - LAPSE : Training Sample
Rank for VariableN ObsDecile MeanOverall Mean
pred
120,98679%30%
221,01470%30%
320,99938%30%
421,04129%30%
517,83925%30%
624,16822%30%
720,95220%30%
821,01312%30%
920,9987%30%
1020,9905%30%

Analysis Variable : LAPSE : Validation Sample
Rank for VariableN ObsDecile MeanOverall Mean
pred
19,034100%21%
29,034100%21%
39,03413%21%
48,9680%21%
510,8220%21%
67,3120%21%
79,0330%21%
89,0140%21%
99,0550%21%
109,0340%21%

I hve attached the gain chart. Many Thanks

3 REPLIES 3
adjgiulio
Obsidian | Level 7

Knowing nothing else, it seems to me that your training model is not generalizing well to the validation set. Which is usually a sign of overfitting.

What tool are you using to create the initial model, and what technique?

Question
Fluorite | Level 6

Hi,

I am using SAS and Logistic Regression. But the gain chart is showing that the model is robust. Please See attached

Many Thanks

Alice

AmitKB
Fluorite | Level 6
Hi,                 Did you use the train decile definition for validation data. For example for training data, the first decile the min and max probabilities were say 0.9 - 0.95. Then you should use the same decile definition for Validation data.                  If you have used great, else use the train dataa decile definitions to compare train with validation.        Best Regars,      Amit

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1469 views
  • 0 likes
  • 3 in conversation