Credit scorecards have been the standard model for credit scoring because they are easy to interpret and enable you to easily score new data – that is, calculate a credit score for new customers. This tip walks you through the basic steps to build a credit scorecard developed using Credit Scoring for SAS® Enterprise Miner™ and is the first in a series of tips that I will be posting on credit scoring.
Building a Scorecard
The nodes in the basic flow diagram to build a credit scorecard are: Input Data Source, Data Partition, Interactive Grouping, and Scorecard. For this example you can use the German Credit data set available in the Help menu of SAS Enterprise Miner. Click on Help->Generate Sample Data Source -> German Credit. This data set has a binary target good_bad that indicates whether a customer defaulted on his monthly payments (designated with the value 'BAD'), as well as several other variables related to demographics and credit bureau that serve as inputs, or characteristics, .
Interactive Grouping Node
In a nutshell, the interactive grouping node is a very flexible tool for binning or grouping your variables. This node:
bins your input variables using options you can easily tweak
calculates the weight of evidence of the bins for each input variable
calculates Gini and Information Value, and rejects input variables with a low value of these statistics
The procedures running behind the scenes find the optimal binning of the inputs with respect to the target, subject to certain constraints that you can easily customize. Make sure you use the interactive application of the node to visually confirm that the event counts and weight of evidence trend make sense for your binning. If necessary, you can merge bins, create new groups, or manually adjust the weight of evidence.
Manually adjusting the Weight of Evidence
For certain variable inputs you might need to manually adjust the weight of evidence (WOE). For example, the variable employed summarizes the number of years that a credit applicant has been employed at his current job. In general, years at current job tends to be proportionally inverse to credit default. The fact that the weight of evidence does not decrease monotonically for groups 1 through 5 on this data set can be due to a number of reasons. For example, this data set might be sample-biased because many applications with employed<2 were hand selected or "cherry-picked", and their good behavior is reflected in a low event count and low weight of evidence. To prevent this sample bias from affecting your scorecard you can use the Manual WOE column on the Coarse Detail view of the Groupings tab in the interactive application. Change the WOE from 0.1283 to 0.7 for group 1 and from -0.13131 to -0.5 for group 2. Notice that the new weight of evidence is plotted as New WOE and the information value is re-calculated as New Information Value.
Scorecard Node
Once you are satisfied with the bins or groups you found with the Interactive Grouping node, run the Scorecard node to model a logistic regression using your grouped inputs. Then it will create a linear transformation of the predicted log of the odds for each input group, or attribute, into scorepoints that are much easier to interpret.
By default, with each increase of 20 scorepoints, the odds of the event double. The event you are modeling is payment default, which means that for example an application scored with 130 points has double the odds of defaulting compared to an application with score of 150.
In the results, there are several useful plots and tables including the scorecard, the score distribution, the KS plot, the trade-off plot, and many others.
Output variables and Adverse Characteristics
Notice from the exported data sets that the Scorecard node creates several variables. The variables with prefix SCR_ are the scorecard points for each variable in the scorecard, and SCORECARD_POINTS is the total points for each application.
When you specify the Scorecard property Generate Report=Yes to output the Adverse Characteristics, your results will also include the variables that decreased the scorepoints the most for each observation. You can select up to 5 adverse characteristics. As an example of how to interpret this columns, for the first observation on the data set below, 14 scorepoints were deducted because the purpose of the loan was labeled either 1, 3, 8, missing, or unknown.
Recommended reading
SAS Enterprise Miner Reference Help: SAS Credit Scoring
Siddiqi, Naeem, Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring, Cary, NC: SAS Press, 2005.
... View more