Re: Credit Scorecard Model issues

ManOfHonor · Posted 11-24-2017 07:15 AM

Hi, guys.

I am building a scorecard model in Enterprise Miner:diagram

And I meet several problems:

1) very few target events:def_6_30 - overdue 30+ on 6 months

to overcome this limitaion I involved frequency variable, but I suspect model to bias:

2) cannot get stable model

before reject inference

3) train / validation gini varies drastically: before reject inference, data partition 50/50 stratified, train gini=0.52, validation gini=0.49

before reject inference

And some questions:

1) How to estimate bad rate and approval of scorecard model?

All I need - is to improove old scorecard model, to archive this I tried to exclude predictors (start from the lowest information value) and add new ones (with high IV) . Honestly speaking have no other ides of doing that.

P.s. I used Naim Siddiqui's book and this:

DougWielenga · Posted 11-30-2017 04:11 PM

Unfortunately, there is only so much stability you can get out of the small number of observations you are working with. I also would be concerned that there would be great variability in the model metrics for Train/Validate for slight changes in the model (again, a stability issue). Inflating the frequency improves the calculated percentage of events but it doesn't really represent more information to the model.

You have few enough observations that I would consider not splitting at all and then doing some type of bootstrap modeling to try and get at a small set of predictors for fitting a model. It also might be useful to consider a one-split Tree model to see if there are any of the nodes that almost never have the event. You could then attempt to do your analysis on the remaining nodes which would by definition have a higher percentage of the event of interest. In this way, you could pre-filter the data in a second flow where you could model the observations having a higher chance of the target event. This could provide a better solution than fitting against the whole data set given how rare your event is. You would have to deploy the score code conditionally depending on whether a given observation was pre-filtered or whether it was subject to be scored by the Tree model.

Again, this doesn't solve your problem with having so few events to begin with but you might find it easier to model the remaining observations after removing those with almost no chance of having the target event. SAS Enterprise Miner was designed for data mining data sets which are often extremely large, so this would require a bit more work but should be easy to implement.

Hope this helps!

Doug