- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I have a question about weight of evidence for credit score modeling. I found that researcher are using the two contrasting methods to calculate the WOE.
1. Probability of default: Assuming that the probability of default is event (y=1), I found that WOE is ln(distr of good/distr of bad) in Siddiqi book, but some are using just opposite like this: ln(distr of bad/distr of good).
2. For credit card application: Assuming the credit card approval (y=1), WOE is ln(distr of bad/distr of good). Is it right or just opposite?
Either way, do we get the same IV value?
SAS book says that WOE depends upon how to define the event or non-event and provides this log ratio: ln(% non event / % event)
I got really confused? HELP??
Thanks,
Bikash
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It really doesn't matter.
If you use ln(distr of good/distr of bad) then big numbers are good, and small numbers are bad.
If you use ln(distr of bad/distr of good) then big numbers are bad, and small numbers are good.
So which do you prefer, big numbers are good, or small numbers are good?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paige,
These two statement are not always right:
If you use ln(distr of good/distr of bad) then big numbers are good, and small numbers are bad.
If you use ln(distr of bad/distr of good) then big numbers are bad, and small numbers are good.
For WOE, I am looking for the ratio of %of good to %of bad, not total number of good/total number of bad for a particular category. I also thought the same thing: it does not matter which way we formulate the WOE, but we will get the same IV value for predicting its performance.
For the sake of easy interpretation, I have seen a couple of papers using this formula too:
WOEij=log(P(Xj∈Bi|Y=1)/P(Xj∈ Bi|Y=0)).
Thanks,
Bikash
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For your first Q:
Both are right. They just have +beta or -beta .
All you need is checking the high score group should have lower bad percent.
ln(distr of good/distr of bad) in Siddiqi book, ---> model good_bad(event='bad')=
but some are using just opposite like this: ln(distr of bad/distr of good). ---> model good_bad(event='good')=
For your second Q:
"WOE is ln(distr of bad/distr of good). " should be "model y(event='0')= ".
But you need check if the higher score have the lower bad percent. If not ,then switch into "model y(event='1')= "
"Either way, do we get the same IV value?"
Yes. Both have the same IV .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Ksharp for your clarification.