BookmarkSubscribeRSS Feed
JakesVenter
Obsidian | Level 7

Hi guys,

 

I recently built a scorecard model using SAS E-miner's credit scoring node. The scorecard proved to be very good with excellent gini values and the rank ordering of the scores in terms of the events/non-events was more than satisifactory. However, the event modelled is extremely rare. In the sample, the events (1's) account for roughly 30%, whereas the true population proportion is 0.15%. The person who will be using the model is completely focussed on wanting a cut-off below which all events will be classified as 1's when predicting. The problem I face now is that - because of the rare event - the model DOES accurately capture a large percentage of event's below certain cut-offs when running it on out-of-time data, but it has an extremely large False Positive Rate as well, due to the large number of non-events in the population. I have adjusted the regression intercept for oversampling, but this does not seem to do much i.t.v. of cut-offs. Is there perhaps any techniques that you would recommend?

 

Thanks!!

2 REPLIES 2
sbxkoenk
SAS Super FREQ

Hello,

 

There will always be a trade-off of course between:

- sensitivity = true positive rate (TPR) = hit rate = recall on the one hand AND

- precision = positive predictive value (PPV) on the other hand

 

Changing the cut-off (decision threshold) in a particular direction will improve one of these 2, but will worsen the other. Inevitably.

Also: The higher the false positive rate, the lower the precision.

 

Do you already use the Gains table and the Trade-off Plots in the Scorecard node?

 

The trade-off plots display the approval rate and bad rate against cutoff scores. In credit scoring, trade-off plots are used to show how the approval rate and the bad rate among the accepted applicants depend on the cutoff score. A good scorecard enables the choice of a cutoff score that corresponds to a relatively high approval rate with a relatively low bad rate.

 

The gains table shows you "Average Marginal Profit" and "Average Total Profit" per score bucket using "Revenue Accepted Good" and "Cost Accepted Bad" (specified by you in the properties). I think the online doc (accessible from within EMiner) provides you with all the formulas.

 

If you don't want to rely on the Scorecard node for choosing your cut-off you can always consider to use the cut-off node. It will choose the "best" cut-off probability according to the criterion of your choice (you can easily derive which score is mapped to it). For example: Kolmogorov–Smirnov 

 

And important! Enterprise Miner allows decision processing.

See

SAS® Enterprise Miner™ 14.1 Extension Nodes: Developer's Guide.

https://support.sas.com/documentation/cdl/en/emxndg/67980/PDF/default/emxndg.pdf

Appendix 3
Predictive Modeling
Decision Thresholds and Profit Charts (p. 178)

The final classification of a new applicant in the class of good or the class of bad risks will be based on profit considerations.

 

Some people choose to optimize the F1-score as the best balance between sensitivity and precision.

F1-score is the harmonic mean of precision and sensitivity

See https://en.wikipedia.org/wiki/Precision_and_recall

in case you want to maximize the F1-score, you can write an optimization to find the best cut-off or simply a simulation (let the cut-off vary between a start and a stop value by an increment and calculate the quality metrics that go with each particular cut-off). Then make a choice.

 

Good luck,

Koen

 

JakesVenter
Obsidian | Level 7

Thanks for the help Koen! I will have a look at the different techniques you mentioned and see which one shows the best performance/classification.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 3640 views
  • 0 likes
  • 2 in conversation