12-01-2014 03:18 AM
I m building a predictive model by using SAS E-Miner credit Scoring . I used the
data from 2007 to 2012. We have a factor call "Terms" which current situation (2013-2014)
of the data is significantly change.The average
of term for the model is around 30 terms. Two year later term are getting longer (Average 48 terms) due to changing of some practices
. If I countinue using "Term" in my model. I think it s no longer useful and I have to re-build the model so soon. For example,
=< 12 terms = 10 points
13-24 = 5 points
25- 36 = 0 points
>=36 = -5 point
From the score, all new customers will receive -5 points. Because the nobody has no term Less than 36. The proportion of other groups will be 0.
So I decided to standardize my factor by 2 methods.
1) In finding Z we need SD, No of observation, and Average.
I fixed the AVG of all records by using Average of data in model. This mean all records of data will have the same N, and Average
2) N , Average and SD are not fixed, They are vary by their date.
Below are examples of the two groups
From the above example, Value of Z of the two group are changed and can effect to the range of score.
I need u all suggestion about the method of calculating the Z value which one is better.
My background is not statistician or mathematician, but really interested in Data Modelling and really need support from you all guys.
Thanks in advance.
12-02-2014 10:35 AM
Not a statistician here, but I am a big fan of the Interactive Grouping and Scorecard nodes in Enterprise Miner. I used to do everything by hand back in the day, and these nodes do a lot of the hard work behind the scenes.
Two things that I thing you could improve:
You are using data from 2007 to 2012. This time window is too large. The standard is to have 12 month periods, but it could vary depending on whether it is new applications vs behavior, the specific credit practice, or country regulations. But I would rather do yearly periods e.g. use June 2010 through May 2011 to predict the next year. I would think that this way the variable TERMS would be more stable.
Instead of using the moving average to determine how to bin or group the variable TERMS, I would use the Interactive Grouping node in the credit scoring tab directly. This node bins the variables for you, and most importantly calculates and graphs the weight of evidence of each of the bins, it also calculates the Information Value and only passes the variables that meet a threshold to the next nodes in your diagram (think of a Scorecard node).
I would much rather use weight of evidence and information value than moving average to assess a variable.
You can learn more about these nodes in the reference help. Press F1 if you are on Enterprise Miner.
This book is also a must read for credit scoring professionals:
I hope this helps,