topic Weight of evidence in New SAS User

Weight of evidence

bikashten — Tue, 23 Apr 2019 17:40:50 GMT

Hi all,

I have a question about weight of evidence for credit score modeling. I found that researcher are using the two contrasting methods to calculate the WOE.

1. Probability of default: Assuming that the probability of default is event (y=1), I found that WOE is ln(distr of good/distr of bad) in Siddiqi book, but some are using just opposite like this: ln(distr of bad/distr of good).

2. For credit card application: Assuming the credit card approval (y=1), WOE is ln(distr of bad/distr of good). Is it right or just opposite?

Either way, do we get the same IV value?

SAS book says that WOE depends upon how to define the event or non-event and provides this log ratio: ln(% non event / % event)

I got really confused? HELP??

Thanks,

Bikash

Re: Weight of evidence

PaigeMiller — Tue, 23 Apr 2019 17:49:44 GMT

It really doesn't matter.

If you use ln(distr of good/distr of bad) then big numbers are good, and small numbers are bad.

If you use ln(distr of bad/distr of good) then big numbers are bad, and small numbers are good.

So which do you prefer, big numbers are good, or small numbers are good?

Re: Weight of evidence

bikashten — Tue, 23 Apr 2019 18:18:18 GMT

Hi Paige,

These two statement are not always right:

If you use ln(distr of good/distr of bad) then big numbers are good, and small numbers are bad.

If you use ln(distr of bad/distr of good) then big numbers are bad, and small numbers are good.

For WOE, I am looking for the ratio of %of good to %of bad, not total number of good/total number of bad for a particular category. I also thought the same thing: it does not matter which way we formulate the WOE, but we will get the same IV value for predicting its performance.

For the sake of easy interpretation, I have seen a couple of papers using this formula too:

WOEij=log(P(Xj∈Bi|Y=1)/P(Xj∈ Bi|Y=0)).

Thanks,

Bikash

Re: Weight of evidence

Ksharp — Wed, 24 Apr 2019 13:30:06 GMT

For your first Q:

Both are right. They just have +beta or -beta .

All you need is checking the high score group should have lower bad percent.

ln(distr of good/distr of bad) in Siddiqi book, ---> model good_bad(event='bad')=

but some are using just opposite like this: ln(distr of bad/distr of good). ---> model good_bad(event='good')=

For your second Q:

"WOE is ln(distr of bad/distr of good). " should be "model y(event='0')= ".

But you need check if the higher score have the lower bad percent. If not ,then switch into "model y(event='1')= "

"Either way, do we get the same IV value?"

Yes. Both have the same IV .

Re: Weight of evidence

bikashten — Tue, 30 Apr 2019 12:18:31 GMT

Thanks Ksharp for your clarification.