BookmarkSubscribeRSS Feed
bikashten
Fluorite | Level 6

Hi all,

I have a question about weight of evidence for credit score modeling. I found that researcher are using the two contrasting methods to calculate the WOE. 

 

1. Probability of default: Assuming that the probability of default is event (y=1), I found that WOE is ln(distr of good/distr of bad) in Siddiqi book, but some are using just opposite like this: ln(distr of bad/distr of good).

 

2. For credit card application: Assuming the credit card approval (y=1), WOE is ln(distr of bad/distr of good). Is it right or just opposite?

 

Either way, do we get the same IV value?

 

SAS book says that WOE depends upon how to define the event or non-event and provides this log ratio: ln(% non event / % event)

 

I got really confused? HELP??

 

Thanks,

Bikash

4 REPLIES 4
PaigeMiller
Diamond | Level 26

It really doesn't matter.

 

If you use ln(distr of good/distr of bad) then big numbers are good, and small numbers are bad.

 

If you use ln(distr of bad/distr of good) then big numbers are bad, and small numbers are good.

 

So which do you prefer, big numbers are good, or small numbers are good?

--
Paige Miller
bikashten
Fluorite | Level 6

Hi Paige,

These two statement are not always right:

 

If you use ln(distr of good/distr of bad) then big numbers are good, and small numbers are bad.

If you use ln(distr of bad/distr of good) then big numbers are bad, and small numbers are good.

 

For WOE, I am looking for the ratio of %of good to %of bad, not total number of good/total number of bad for a particular category. I also thought the same thing: it does not matter which way we formulate the WOE, but we will get the same IV value for predicting its performance.

 

For the sake of easy interpretation, I have seen a couple of papers using this formula too:

WOEij=log(P(XjBi|Y=1)/P(Xj∈ Bi|Y=0)). 

 

Thanks,

Bikash

 

Ksharp
Super User

For your first Q:

Both are right. They just have +beta or -beta .

All you need is checking the high score group should have lower bad percent.

 

ln(distr of good/distr of bad) in Siddiqi book,   --->  model good_bad(event='bad')=

but some are using just opposite like this: ln(distr of bad/distr of good). --->   model good_bad(event='good')=

 

 

For your second Q:

"WOE is ln(distr of bad/distr of good). "      should be     "model y(event='0')=  ".

But you need check if the higher score have the lower bad percent. If not ,then switch into "model y(event='1')= "

 

"Either way, do we get the same IV value?"

Yes. Both have the same IV .

 

 

bikashten
Fluorite | Level 6

Thanks Ksharp for your clarification. 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 922 views
  • 0 likes
  • 3 in conversation