Dear all modelling experts,
I have built an attrition model by using logistic regression algorithm. Considering r two months operational gap to see the probability of a customer to be churned. In model development process, I'm getting a very good model accuracy with ROC(.94) and Gini(.75).However, while I'm looking at the probability of events category - by looking at P_1.the scores are showing a bit low though the events occurred accurately. Could anyone suggest how could I adjust the probability scores i.e. if I target top 3 deciles population then I have to consider a customer who has a chance to attrite 30% but how could I reweight that customer as 90%. I have tried to reweighted by AIC or -2logL but still the scores are same. It would be really great to have an expert suggestions.
please find below the average scores distribution from model scores:
0 | 13828 | 13828 | 0.000458541 | 0.000176654 | 0.000253029 | 0.000682157 |
1 | 9468 | 9468 | 0.0011240 | 0.000118160 | 0.000732903 | 0.0011879 |
2 | 11772 | 11772 | 0.0014577 | 0.000287458 | 0.0012162 | 0.0020466 |
3 | 12338 | 12338 | 0.0022044 | 0.000060845 | 0.0021169 | 0.0022530 |
4 | 11304 | 11304 | 0.0037133 | 0.000852377 | 0.0022655 | 0.0053192 |
5 | 11802 | 11802 | 0.0087238 | 0.0021891 | 0.0054045 | 0.0131404 |
6 | 11682 | 11682 | 0.0150799 | 0.0024730 | 0.0131558 | 0.0232916 |
7 | 11843 | 11843 | 0.0411295 | 0.0108332 | 0.0236994 | 0.0622775 |
8 | 10159 | 10159 | 0.1500261 | 0.0759429 | 0.0623467 | 0.2864124 |
9 | 13409 | 13409 | 0.4677841 | 0.1755924 | 0.2919547 | 0.9407925 |
I have also provided the model output for your convenience:
Model Convergence Status | |||
Convergence criterion (GCONV=1E-8) satisfied. | |||
Model Fit Statistics | |||
Criterion | Intercept Only | Intercept and Covariates | |
AIC | 188448 | 90327.87 | |
SC | 188458.33 | 90462.22 | |
-2 Log L | 188446.00 | 90301.87 | |
Testing Global Null Hypothesis: BETA=0 | |||
Test | Chi-Square | DF | Pr > ChiSq |
Likelihood Ratio | 98144.1267 | 12 | <.0001 |
Score | 89795.0913 | 12 | <.0001 |
Wald | 32295.415 | 12 | <.0001 |
Percent Concordant | 94.3 | Somers' D | 0.893 |
Percent Discordant | 5 | Gamma | 0.9 |
Percent Tied | 0.7 | Tau-a | 0.222 |
Pairs | 6419968407 | c | 0.947 |
Partition for the Hosmer and Lemeshow Test | |||||
Group | Total | response = 1 | response = 0 | ||
Observed | Expected | Observed | Expected | ||
1 | 25145 | 44 | 9.51 | 25101 | 25135.49 |
2 | 18536 | 30 | 16.71 | 18506 | 18519.29 |
3 | 19557 | 53 | 25.88 | 19504 | 19531.12 |
4 | 22796 | 66 | 44.57 | 22730 | 22751.43 |
5 | 22089 | 84 | 102.85 | 22005 | 21986.15 |
6 | 22674 | 340 | 251.13 | 22334 | 22422.87 |
7 | 22745 | 802 | 709.69 | 21943 | 22035.31 |
8 | 22445 | 2860 | 3013.02 | 19585 | 19431.98 |
9 | 20979 | 6777 | 7559.15 | 14202 | 13419.85 |
10 | 30466 | 21967 | 21290.48 | 8499 | 9175.52 |
Hosmer and Lemeshow Goodness-of-Fit | |||||
Test | |||||
Chi-Square | DF | Pr > ChiSq | |||
428.9279 | 8 | <.0001
|
|
Dear all modelling experts,
I have built an attrition model by using logistic regression algorithm. Considering r two months operational gap to see the probability of a customer to be churned. In model development process, I'm getting a very good model accuracy with ROC(.94) and Gini(.75).However, while I'm looking at the probability of events category - by looking at P_1.the scores are showing a bit low though the events occurred accurately. Could anyone suggest how could I adjust the probability scores i.e. if I target top 3 deciles population then I have to consider a customer who has a chance to attrite 30% at decile 1(top 10% population) and in actual world that customer is also churning , in that case, how could I reweight that customer as 90% at decile 1. I have tried to reweighted by AIC or -2logL but still the scores are same. It would be really great to have an expert suggestions.
please find below the average scores distribution from model scores:
Row Labels Average of SCORE_1 Actually closed # RISK Customer Grand
0 | 0.858566898 | 485 | 16531 | 0.029338818 |
1 | 0.602700652 | 345 | 16322 | 0.021137116 |
2 | 0.459997375 | 321 | 16343 | 0.019641437 |
3 | 0.25116008 | 401 | 16661 | 0.024068183 |
4 | 0.13434664 | 523 | 16014 | 0.032658923 |
5 | 0.070197285 | 322 | 18175 | 0.017716644 |
6 | 0.039646414 | 429 | 17278 | 0.024829263 |
7 | 0.03082105 | 356 | 13737 | 0.025915411 |
8 | 0.019729811 | 307 | 17477 | 0.017565944 |
9 | 0.009666523 | 278 | 15341 | 0.018121374 |
Regards,
Mou
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.