turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Help, Attrition Model Performance in SAS. Thanks

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-29-2013 10:28 AM

Hi,

I have build an attrition model and I am evaluating its perfomance. I have sorted the probabilities from high to low, dividing the customers into ten

equally-sized groups called “deciles”, such that ten percent of the customer base is con-tained in each decile, and observing model performance in terms of attrition rate by decile. Using the code below..

**proc** **rank** data=OUT groups=**10** out=OUT_DECILE descending;

var P; ranks decile;

**run**;

**data** OUT_DECILE;

set OUT_DECILE;

decile=decile+**1**;

**run**;

**proc** **means** data=OUT_DECILE n mean sum;

var LAPSE;

class decile;

**run**;

I have run it first on the training dataset used to build model and I get this below. Then I have scored the validation dataset, and rank the probabilities again and I get this below. Shall I not get roughly the same % per decile? Is my model not performing well then? I have used the gain chart to compare Validation and Trainig but they looked fine? Your help woul be much appreciated . Many Thanks

Analysis Variable - LAPSE : Training Sample | |||

Rank for Variable | N Obs | Decile Mean | Overall Mean |

pred | |||

1 | 20,986 | 79% | 30% |

2 | 21,014 | 70% | 30% |

3 | 20,999 | 38% | 30% |

4 | 21,041 | 29% | 30% |

5 | 17,839 | 25% | 30% |

6 | 24,168 | 22% | 30% |

7 | 20,952 | 20% | 30% |

8 | 21,013 | 12% | 30% |

9 | 20,998 | 7% | 30% |

10 | 20,990 | 5% | 30% |

Analysis Variable : LAPSE : Validation Sample | |||

Rank for Variable | N Obs | Decile Mean | Overall Mean |

pred | |||

1 | 9,034 | 100% | 21% |

2 | 9,034 | 100% | 21% |

3 | 9,034 | 13% | 21% |

4 | 8,968 | 0% | 21% |

5 | 10,822 | 0% | 21% |

6 | 7,312 | 0% | 21% |

7 | 9,033 | 0% | 21% |

8 | 9,014 | 0% | 21% |

9 | 9,055 | 0% | 21% |

10 | 9,034 | 0% | 21% |

I hve attached the gain chart. Many Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-29-2013 01:50 PM

Knowing nothing else, it seems to me that your training model is not generalizing well to the validation set. Which is usually a sign of overfitting.

What tool are you using to create the initial model, and what technique?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-29-2013 06:00 PM

Hi,

I am using SAS and Logistic Regression. But the gain chart is showing that the model is robust. Please See attached

Many Thanks

Alice

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-10-2013 02:39 PM

Hi, Did you use the train decile definition for validation data. For example for training data, the first decile the min and max probabilities were say 0.9 - 0.95. Then you should use the same decile definition for Validation data. If you have used great, else use the train dataa decile definitions to compare train with validation. Best Regars, Amit