New SAS User

bikashten · Posted 04-23-2019 01:40 PM

Hi all,

I have a question about weight of evidence for credit score modeling. I found that researcher are using the two contrasting methods to calculate the WOE.

1. Probability of default: Assuming that the probability of default is event (y=1), I found that WOE is ln(distr of good/distr of bad) in Siddiqi book, but some are using just opposite like this: ln(distr of bad/distr of good).

2. For credit card application: Assuming the credit card approval (y=1), WOE is ln(distr of bad/distr of good). Is it right or just opposite?

Either way, do we get the same IV value?

SAS book says that WOE depends upon how to define the event or non-event and provides this log ratio: ln(% non event / % event)

I got really confused? HELP??

Thanks,

Bikash

PaigeMiller · Posted 04-23-2019 01:49 PM

It really doesn't matter.

If you use ln(distr of good/distr of bad) then big numbers are good, and small numbers are bad.

If you use ln(distr of bad/distr of good) then big numbers are bad, and small numbers are good.

So which do you prefer, big numbers are good, or small numbers are good?

--
Paige Miller

bikashten · Posted 04-23-2019 02:18 PM

Hi Paige,

These two statement are not always right:

If you use ln(distr of good/distr of bad) then big numbers are good, and small numbers are bad.

If you use ln(distr of bad/distr of good) then big numbers are bad, and small numbers are good.

For WOE, I am looking for the ratio of %of good to %of bad, not total number of good/total number of bad for a particular category. I also thought the same thing: it does not matter which way we formulate the WOE, but we will get the same IV value for predicting its performance.

For the sake of easy interpretation, I have seen a couple of papers using this formula too:

WOEij=log(P(Xj∈Bi|Y=1)/P(Xj∈ Bi|Y=0)).

Thanks,

Bikash

Ksharp · Posted 04-24-2019 09:30 AM

For your first Q:

Both are right. They just have +beta or -beta .

All you need is checking the high score group should have lower bad percent.

ln(distr of good/distr of bad) in Siddiqi book, ---> model good_bad(event='bad')=

but some are using just opposite like this: ln(distr of bad/distr of good). ---> model good_bad(event='good')=

For your second Q:

"WOE is ln(distr of bad/distr of good). " should be "model y(event='0')= ".

But you need check if the higher score have the lower bad percent. If not ,then switch into "model y(event='1')= "

"Either way, do we get the same IV value?"

Yes. Both have the same IV .

bikashten · Posted 04-30-2019 08:18 AM

Thanks Ksharp for your clarification.

New SAS User

Weight of evidence

Re: Weight of evidence

Re: Weight of evidence

Re: Weight of evidence

Re: Weight of evidence

Follow Us

What is...

New SAS User

Weight of evidence

Re: Weight of evidence

Re: Weight of evidence

Re: Weight of evidence

Re: Weight of evidence

Our biggest data and AI event of the year.

Follow Us

What is...