BookmarkSubscribeRSS Feed
Ksharp
Super User

Hi Top,

I found this , it is NOT necessarily linear. 

But as my opinion : Not linear is assumption violation of GLM .So better get it looks like a line .

 

捕获3.PNG

 

 

Top_Katz
Quartz | Level 8

Hi @Ksharp !

 

Thank you for following up.  It looks like Siddiqi encourages using grouping to find what he calls "logical relationships" which are basically linear trends in the sequence, and acknowledges that you may sacrifice information value to establish that relationship:  "The process of arriving at a logical trend is one of trial and error, in which one balances the creation of logical trends while maintaining a sufficient IV value."  I can see the attraction of using such trend groupings, as long as they don't detract from predictive power, because of their explanatory ability.

 

But I still differ with you about whether a non-linear appearance violates the GLM assumption.  The GLM assumption is about the relationship between the predictor and the (log odds of the) target; binning is designed specifically to try to establish that relationship, which might not exist between the target and the original predictor variable.  The linear appearance, however, is a relationship between the predictor and the bin sequence number, and is not relevant to the GLM assumption.

Ksharp
Super User

Top,

It seems that you are very taking care about IV .

But  in  Statistical Theory , IV or woe is not very important. 

The most important is Goodness Of Fit for Logistic Model.

For enhance GOF , I try to satisfy the assumption of GLM (include linear assumption).

Top_Katz
Quartz | Level 8

Hi @Ksharp !

 

No, what I'm saying is that the process of binning tries to create the linear relationship for the GLM assumptions, but that the appearance of linearity by bin sequence number is irrelevant to the GLM assumptions, which concern the predictor and the target, not the predictor and the bin sequence number.  For a logistic regression where the only predictors are either the bin-transformed WoE variable, or the set of bin indicator functions, a better binning by IV is likely to give you a better fit; in fact, if instead of maximizing IV as your binning metric, you use entropy minimization, that's equivalent to logistic regression maximum likelihood estimation, so it certainly will give you a better fit (and in my experience, maximum IV bins are typically the same as minimum entropy bins, although I think they can disagree).  Not surprisingly, multiple regression fits can be more complicated, and the best univariate binning may not be the best contributor to a multivariate model.  (I'm ignoring the fact, for now, that many statisticians disapprove of using binned variables as regression covariates, mainly because they naturally tend to violate some GLM assumptions.)

Ksharp
Super User

Hi Top,

"but that the appearance of linearity by bin sequence number is irrelevant to the GLM assumptions, which concern the predictor and the target, not the predictor and the bin sequence number."

linearity by bin sequence number is concern the predictor and the target .

Logistic model :  log( p/(1-p))= x1 x2 ...........

log( p/(1-p)) is just like woe = ln(BadDist/GoodDist)  

p=BadDist ,1-p=GoodDist

 

 

"that many statisticians disapprove of using binned variables as regression covariates,"

That is because binning will lost many detail information of data/model ,

which make model low power(if data was accurate and real ,but in fact that was probability not. ).

Binning would be robust for bad data.

 

Top_Katz
Quartz | Level 8

Hi @Ksharp !

 

Maybe I can illustrate this another way.  Not all quantitative relationships are monotonic.  For example, in marketing, it is often found that likelihood to respond increases with the number of contacts up to a point, after which more contact is associated with lower likelihood to respond (too much contact actually annoys people, making them less likely to respond).  Linear models require monotonic relationships between predictors and targets, so regression modelers transform their original predictors to create linear relationships; binning is one way to do that.  So, in this marketing scenario, if you bin the number of contacts according to the log odds of response, you'll see the bin WoEs rise and then fall, so they won't have a linear appearance.  But you can use the bins as a linear predictor in a logistic regression; the binning has linearized the originally non-linear relationship between the response target and the number-of-contacts predictor.  So, you have the GLM linear relationship, but not a linear graph by bin sequence number.  And the fact is, even for monotonic responses, the bins will give you a linear relationship between predictor and target, whatever monotonic shape the sequence of bin WoE values resembles.

Ksharp
Super User

Top,

 bin sequence number could be any form ,like 1,2,3,4 or 1,2 , 90,100 ..............

The reason I make it has the same step is trying to make difference woe between two group as large as it could be (that means the score is distinguished from each group).

 

 

"whatever monotonic shape the sequence of bin WoE values resembles."

linear is for two dimension . In one way dimension, there could not check linear.

Assuming you get woe like :   -0.1 0 0.1 0.8 

why not make it like 

-0.1 0.2 0.5 0.8

which have more capability of distinguish (more linear with same step width).

 

Top_Katz
Quartz | Level 8

Hi @Ksharp !

 

Thank you for responding.  I don't think you're likely to see a trade-off like in your example WoE: -0.1, 0, 0.1, 0.8  OR  -0.1, 0.2, 0.5, 0.8.  Even if it were possible for the same data set to produce those two results (and I don't think it is possible, but I don't have a proof), the IV of the second binning is higher than the IV of the first binning, so there'd be no incentive to choose the first binning.  But suppose you saw the following trade-off:

 

WoE1:  -0.6, -0.2, 0.2, 0.6

WoE2:  -0.6, -0.4, 0.4, 0.6

 

The first one is linear, the second one isn't.  But I think you'll find the second one is superior in every other aspect:  higher IV, higher log-likelihood, higher chi-square, lower sum of squares, lower entropy, etc.  This is the kind of trade-off I think Siddiqi was describing.  The linearity is very appealing visually, but you sacrifice on the fit.  And the visual non-linearity of the second binning has nothing to do with its GLM properties; it will provide a better logistic regression fit than the first binning.

Ksharp
Super User

Hi Top,

I would pick first one due to it could get more distinguish score .

 

WoE1:  -0.6, -0.2, 0.2, 0.6

score:   -10     -5     5    10

 

WoE2:  -0.6, -0.4, 0.4, 0.6

score:   -10    -8      8     10

 

You know score is from woe , if two woe is very close (like -0.6 -0.4) ,then their score would get more close .as show above .

 

I would pick first one because the difference of score is bigger (-10 -5   V.S.  -10 -8 )(5  10   V.S.  8  10 )

Top_Katz
Quartz | Level 8

Hi @Ksharp !

 

Great.  So let's look at an example and see the consequences of your decision.  Here is our set of bins with the linear trend:

 

Bin IDeventsnon-eventsWoEIV
13,9772,1370.6210.131
21,2301,0000.2070.005
31,5131,861-0.2070.008
42,0003,722-0.6210.123
Total8,7208,720 0.267

 

 

But it turns out that the 283 leftmost members of bin three are all events.  If you shift them to the right side of bin two, you get the following:

 

Bin IDeventsnon-eventsWoEIV
13,9772,1370.6210.131
21,5131,0000.4140.024
31,2301,861-0.4140.030
42,0003,722-0.6210.123
Total8,7208,720 0.308

 

Not only is the information value higher, although you say you don't care so much about that, but now you've correctly classified 283 (p = 0.60) events that you misclassified (p = 0.45) in the original binning.  Was the linear appearance worth the misclassifications?  I guess that's your call.

Ksharp
Super User

Hi Top,

Don't forget that is just a sample not a whole population .

I notice your good : bad = 1 : 1 .

You must do some oversampling ? That is meant to be biased .

If you oversample again via different seed , maybe you another whole different consequence , maybe you get reverse scenario.

 

In real world , the data is very complicated and unpredictable . Your example is just a case with a very low probability happened.

Top_Katz
Quartz | Level 8

Hi @Ksharp !

 

The example was completely hypothetical, made up data to show you the possible consequences of making a trade-off based on aesthetic rather than analytical considerations.  You preferred the binning that didn't fit as well based purely on visual appeal.  In this example, with two different ways of binning the same data, the one with better fitness statistics did a better job of classifying the data, even though it didn't have the visual linearity you prefer.  If the differences are significant that will nearly always be the case (in both of the example binnings, the event rate differences between successive bins are all statistically significant with 99% confidence).  You have to use the data that's available to support your decision.  You can't assume that maybe a different data sample will behave the way you want it to, but a careful analyst estimates the error / confidence level in the expected results because some amount of difference may occur.

 

Your method of looking for linearity is based on evenly spacing the bin results along your horizontal axis; if the spacing was different, the graph wouldn't look linear.  But why should the spacing be even?  The bins are not likely of equal width or equal frequency.  It's an aesthetic choice, where are the analytics behind it?  Can you show me an example where a binning with visual linearity outperforms (in some measurable way) a binning of the same data that doesn't have visual linearity, but has higher IV and Somers' D and chi-square than the binning with visual linearity?

 

Ksharp
Super User

Hi Top,

 

"Your method of looking for linearity is based on evenly spacing the bin results along your horizontal axis; if the spacing was different, the graph wouldn't look linear.  But why should the spacing be even?  "

I assume bin have evenly spacing is trying to make score distinguish from each other ,as I show above (-10 -5  V.S.  -10  -8) .I would pick up -10 -5, even -10 -8 have bigger IV .

 

 

 

"The bins are not likely of equal width or equal frequency.  It's an aesthetic choice, where are the analytics behind it? "

I know . and I don't ask for the equal width/freq. I do it with GA for making woe linear and distinguish from each other.

If GA can't get woe linear, then as is it ,

The reason I more care about woe linear and distinguish than IV is could get better Goodness Of Fit statistic .

 

 

"Can you show me an example where a binning with visual linearity outperforms (in some measurable way) a binning of the same data that doesn't have visual linearity, but has higher IV and Somers' D and chi-square than the binning with visual linearity?"

No. I can't . In reality ,the data is complicated . You need many years of experience to test and compare which one is better. But I don't have.

As my point, for a model , you can't just stand on a simple variable analysis ,even it have better IV/Chisq . You need stand on a whole model  to see if this model is right specified and better fit the data, that is the duty of GOF .

 

 

Top_Katz
Quartz | Level 8

Hi @Ksharp !

 

You often mention GOF.  Okay, which GOF statistics do you want to use?  You can apply them to the examples I gave and tell me which one is better.  Or you can create your own examples.  Paul Allison has a nice article called "Measures of Fit for Logistic Regression" and the first goodness of fit test he refers to is Pearson chi-square (as you probably know, Paul Allison is a distinguished statistician and educator).  The Pearson chi-square value for the more linear looking / lower IV set of bins is 1,192.  The Pearson chi-square value for the improved fit higher IV set of bins is 1,305.  As for distinguishing the scores, that is a good goal up to the point where it degrades your accuracy.  In the example I gave, it's not an issue because the event rates for every neighboring pair of bins in both the higher and lower IV sets are different from each other with 99% confidence according to the standard difference of ratios test.

 

The points I'm trying to get across are that:

1.  Visual linearity of equally spaced bin WoE values is completely unrelated to the linearity of the relationship between the predictor and the target, and serves no analytic purpose.  It just looks nice for story telling.

2.  For both prediction accuracy and rank ordering, it is nearly always better to use a metric such as: maximum IV, maximum chi-square, minimum entropy, minimum sum of squares, etc., rather than visual linearity, as a guide.  In particular, for binning with binary targets, minimum entropy is equivalent to logistic regression maximum likelihood.

 

You're certainly correct when you say:  "for a model , you can't just stand on a simple variable analysis"

but that still doesn't justify picking your bins with an arbitrary methodology.

Ksharp
Super User

"which GOF statistics do you want to use?  You can apply them to the examples I gave and tell me which one is better.  Or you can create your own examples.  Paul Allison has a nice article called "Measures of Fit for Logistic Regression" and the first goodness of fit test he refers to is Pearson chi-square"

Here are two GOF(HL test , calibration plot) for logistic model. Here is Rick's blog explain it details.

https://blogs.sas.com/content/iml/2018/05/14/calibration-plots-in-sas.html

https://blogs.sas.com/content/iml/2018/05/16/decile-calibration-plots-sas.html

https://blogs.sas.com/content/iml/2018/05/31/fringe-plot-binary-logistic.html

https://blogs.sas.com/content/iml/2019/02/20/easier-calibration-plot-sas.html

 

Pearson chi-square/DF =1 is testing if the data  is over-disperse ,has nothing to do with GOF.

I have no time to test it with classic Germany ScoreCard data. It is depended on the variables entered into model.

 

"1.  Visual linearity of equally spaced bin WoE values is completely unrelated to the linearity of the relationship between the predictor and the target, and serves no analytic purpose.  It just looks nice for story telling."

What I try to do is to make score more distinguish (-10 -5 V.S. -10 -8) ,and make better GOF ,although IV is not bigger that yours.

 

 

"2.  For both prediction accuracy and rank ordering, it is nearly always better to use a metric such as: maximum IV, maximum chi-square, minimum entropy, minimum sum of squares, etc., rather than visual linearity, as a guide.  In particular, for binning with binary targets, minimum entropy is equivalent to logistic regression maximum likelihood."

No. I disagree with that. you need stand on a whole model ,not just a separated variable, there are interaction effect between variables.

What I do with linearity of equally spaced bin is trying to not break assumption violation of GLM and get better GOF .

 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 49 replies
  • 5053 views
  • 11 likes
  • 2 in conversation