Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binni... - Page 2

Ksharp · Posted 04-29-2019 10:35 AM

Hi Top,

I found this , it is NOT necessarily linear.

But as my opinion : Not linear is assumption violation of GLM .So better get it looks like a line .

Top_Katz · Posted 04-29-2019 10:55 AM

Thank you for following up. It looks like Siddiqi encourages using grouping to find what he calls "logical relationships" which are basically linear trends in the sequence, and acknowledges that you may sacrifice information value to establish that relationship: "The process of arriving at a logical trend is one of trial and error, in which one balances the creation of logical trends while maintaining a sufficient IV value." I can see the attraction of using such trend groupings, as long as they don't detract from predictive power, because of their explanatory ability.

But I still differ with you about whether a non-linear appearance violates the GLM assumption. The GLM assumption is about the relationship between the predictor and the (log odds of the) target; binning is designed specifically to try to establish that relationship, which might not exist between the target and the original predictor variable. The linear appearance, however, is a relationship between the predictor and the bin sequence number, and is not relevant to the GLM assumption.

Ksharp · Posted 04-29-2019 11:03 AM

Top,

It seems that you are very taking care about IV .

But in Statistical Theory , IV or woe is not very important.

The most important is Goodness Of Fit for Logistic Model.

For enhance GOF , I try to satisfy the assumption of GLM (include linear assumption).

Top_Katz · Posted 04-29-2019 11:24 AM

Hi @Ksharp !

No, what I'm saying is that the process of binning tries to create the linear relationship for the GLM assumptions, but that the appearance of linearity by bin sequence number is irrelevant to the GLM assumptions, which concern the predictor and the target, not the predictor and the bin sequence number. For a logistic regression where the only predictors are either the bin-transformed WoE variable, or the set of bin indicator functions, a better binning by IV is likely to give you a better fit; in fact, if instead of maximizing IV as your binning metric, you use entropy minimization, that's equivalent to logistic regression maximum likelihood estimation, so it certainly will give you a better fit (and in my experience, maximum IV bins are typically the same as minimum entropy bins, although I think they can disagree). Not surprisingly, multiple regression fits can be more complicated, and the best univariate binning may not be the best contributor to a multivariate model. (I'm ignoring the fact, for now, that many statisticians disapprove of using binned variables as regression covariates, mainly because they naturally tend to violate some GLM assumptions.)

Ksharp · Posted 04-29-2019 11:42 AM

Hi Top,

"but that the appearance of linearity by bin sequence number is irrelevant to the GLM assumptions, which concern the predictor and the target, not the predictor and the bin sequence number."

linearity by bin sequence number is concern the predictor and the target .

Logistic model : log( p/(1-p))= x1 x2 ...........

log( p/(1-p)) is just like woe = ln(BadDist/GoodDist)

p=BadDist ,1-p=GoodDist

"that many statisticians disapprove of using binned variables as regression covariates,"

That is because binning will lost many detail information of data/model ,

which make model low power(if data was accurate and real ,but in fact that was probability not. ).

Binning would be robust for bad data.

Top_Katz · Posted 04-29-2019 12:18 PM

Hi @Ksharp !

Maybe I can illustrate this another way. Not all quantitative relationships are monotonic. For example, in marketing, it is often found that likelihood to respond increases with the number of contacts up to a point, after which more contact is associated with lower likelihood to respond (too much contact actually annoys people, making them less likely to respond). Linear models require monotonic relationships between predictors and targets, so regression modelers transform their original predictors to create linear relationships; binning is one way to do that. So, in this marketing scenario, if you bin the number of contacts according to the log odds of response, you'll see the bin WoEs rise and then fall, so they won't have a linear appearance. But you can use the bins as a linear predictor in a logistic regression; the binning has linearized the originally non-linear relationship between the response target and the number-of-contacts predictor. So, you have the GLM linear relationship, but not a linear graph by bin sequence number. And the fact is, even for monotonic responses, the bins will give you a linear relationship between predictor and target, whatever monotonic shape the sequence of bin WoE values resembles.

Ksharp · Posted 04-30-2019 09:53 AM

Top,

bin sequence number could be any form ,like 1,2,3,4 or 1,2 , 90,100 ..............

The reason I make it has the same step is trying to make difference woe between two group as large as it could be (that means the score is distinguished from each group).

"whatever monotonic shape the sequence of bin WoE values resembles."

linear is for two dimension . In one way dimension, there could not check linear.

Assuming you get woe like : -0.1 0 0.1 0.8

why not make it like

-0.1 0.2 0.5 0.8

which have more capability of distinguish (more linear with same step width).

Top_Katz · Posted 04-30-2019 11:41 AM

Hi @Ksharp !

Thank you for responding. I don't think you're likely to see a trade-off like in your example WoE: -0.1, 0, 0.1, 0.8 OR -0.1, 0.2, 0.5, 0.8. Even if it were possible for the same data set to produce those two results (and I don't think it is possible, but I don't have a proof), the IV of the second binning is higher than the IV of the first binning, so there'd be no incentive to choose the first binning. But suppose you saw the following trade-off:

WoE1: -0.6, -0.2, 0.2, 0.6

WoE2: -0.6, -0.4, 0.4, 0.6

The first one is linear, the second one isn't. But I think you'll find the second one is superior in every other aspect: higher IV, higher log-likelihood, higher chi-square, lower sum of squares, lower entropy, etc. This is the kind of trade-off I think Siddiqi was describing. The linearity is very appealing visually, but you sacrifice on the fit. And the visual non-linearity of the second binning has nothing to do with its GLM properties; it will provide a better logistic regression fit than the first binning.

Ksharp · Posted 05-01-2019 09:30 AM

Hi Top,

I would pick first one due to it could get more distinguish score .

WoE1: -0.6, -0.2, 0.2, 0.6

score: -10 -5 5 10

WoE2: -0.6, -0.4, 0.4, 0.6

score: -10 -8 8 10

You know score is from woe , if two woe is very close (like -0.6 -0.4) ,then their score would get more close .as show above .

I would pick first one because the difference of score is bigger (-10 -5 V.S. -10 -8 )(5 10 V.S. 8 10 )

Top_Katz · Posted 05-01-2019 12:38 PM

Hi @Ksharp !

Great. So let's look at an example and see the consequences of your decision. Here is our set of bins with the linear trend:

Bin ID	events	non-events	WoE	IV
1	3,977	2,137	0.621	0.131
2	1,230	1,000	0.207	0.005
3	1,513	1,861	-0.207	0.008
4	2,000	3,722	-0.621	0.123
Total	8,720	8,720		0.267

But it turns out that the 283 leftmost members of bin three are all events. If you shift them to the right side of bin two, you get the following:

Bin ID	events	non-events	WoE	IV
1	3,977	2,137	0.621	0.131
2	1,513	1,000	0.414	0.024
3	1,230	1,861	-0.414	0.030
4	2,000	3,722	-0.621	0.123
Total	8,720	8,720		0.308

Not only is the information value higher, although you say you don't care so much about that, but now you've correctly classified 283 (p = 0.60) events that you misclassified (p = 0.45) in the original binning. Was the linear appearance worth the misclassifications? I guess that's your call.

Ksharp · Posted 05-02-2019 11:23 AM

Hi Top,

Don't forget that is just a sample not a whole population .

I notice your good : bad = 1 : 1 .

You must do some oversampling ? That is meant to be biased .

If you oversample again via different seed , maybe you another whole different consequence , maybe you get reverse scenario.

In real world , the data is very complicated and unpredictable . Your example is just a case with a very low probability happened.

Top_Katz · Posted 05-02-2019 02:04 PM

Hi @Ksharp !

The example was completely hypothetical, made up data to show you the possible consequences of making a trade-off based on aesthetic rather than analytical considerations. You preferred the binning that didn't fit as well based purely on visual appeal. In this example, with two different ways of binning the same data, the one with better fitness statistics did a better job of classifying the data, even though it didn't have the visual linearity you prefer. If the differences are significant that will nearly always be the case (in both of the example binnings, the event rate differences between successive bins are all statistically significant with 99% confidence). You have to use the data that's available to support your decision. You can't assume that maybe a different data sample will behave the way you want it to, but a careful analyst estimates the error / confidence level in the expected results because some amount of difference may occur.

Your method of looking for linearity is based on evenly spacing the bin results along your horizontal axis; if the spacing was different, the graph wouldn't look linear. But why should the spacing be even? The bins are not likely of equal width or equal frequency. It's an aesthetic choice, where are the analytics behind it? Can you show me an example where a binning with visual linearity outperforms (in some measurable way) a binning of the same data that doesn't have visual linearity, but has higher IV and Somers' D and chi-square than the binning with visual linearity?

Ksharp · Posted 05-03-2019 09:42 AM

Hi Top,

"Your method of looking for linearity is based on evenly spacing the bin results along your horizontal axis; if the spacing was different, the graph wouldn't look linear. But why should the spacing be even? "

I assume bin have evenly spacing is trying to make score distinguish from each other ,as I show above (-10 -5 V.S. -10 -8) .I would pick up -10 -5, even -10 -8 have bigger IV .

"The bins are not likely of equal width or equal frequency. It's an aesthetic choice, where are the analytics behind it? "

I know . and I don't ask for the equal width/freq. I do it with GA for making woe linear and distinguish from each other.

If GA can't get woe linear, then as is it ,

The reason I more care about woe linear and distinguish than IV is could get better Goodness Of Fit statistic .

"Can you show me an example where a binning with visual linearity outperforms (in some measurable way) a binning of the same data that doesn't have visual linearity, but has higher IV and Somers' D and chi-square than the binning with visual linearity?"

No. I can't . In reality ,the data is complicated . You need many years of experience to test and compare which one is better. But I don't have.

As my point, for a model , you can't just stand on a simple variable analysis ,even it have better IV/Chisq . You need stand on a whole model to see if this model is right specified and better fit the data, that is the duty of GOF .

Top_Katz · Posted 05-03-2019 10:55 AM

Hi @Ksharp !

You often mention GOF. Okay, which GOF statistics do you want to use? You can apply them to the examples I gave and tell me which one is better. Or you can create your own examples. Paul Allison has a nice article called "Measures of Fit for Logistic Regression" and the first goodness of fit test he refers to is Pearson chi-square (as you probably know, Paul Allison is a distinguished statistician and educator). The Pearson chi-square value for the more linear looking / lower IV set of bins is 1,192. The Pearson chi-square value for the improved fit higher IV set of bins is 1,305. As for distinguishing the scores, that is a good goal up to the point where it degrades your accuracy. In the example I gave, it's not an issue because the event rates for every neighboring pair of bins in both the higher and lower IV sets are different from each other with 99% confidence according to the standard difference of ratios test.

The points I'm trying to get across are that:

1. Visual linearity of equally spaced bin WoE values is completely unrelated to the linearity of the relationship between the predictor and the target, and serves no analytic purpose. It just looks nice for story telling.

2. For both prediction accuracy and rank ordering, it is nearly always better to use a metric such as: maximum IV, maximum chi-square, minimum entropy, minimum sum of squares, etc., rather than visual linearity, as a guide. In particular, for binning with binary targets, minimum entropy is equivalent to logistic regression maximum likelihood.

You're certainly correct when you say: "for a model , you can't just stand on a simple variable analysis"

but that still doesn't justify picking your bins with an arbitrary methodology.

Ksharp · Posted 05-03-2019 11:34 AM

"which GOF statistics do you want to use? You can apply them to the examples I gave and tell me which one is better. Or you can create your own examples. Paul Allison has a nice article called "Measures of Fit for Logistic Regression" and the first goodness of fit test he refers to is Pearson chi-square"

Here are two GOF(HL test , calibration plot) for logistic model. Here is Rick's blog explain it details.

https://blogs.sas.com/content/iml/2018/05/14/calibration-plots-in-sas.html

https://blogs.sas.com/content/iml/2018/05/16/decile-calibration-plots-sas.html

https://blogs.sas.com/content/iml/2018/05/31/fringe-plot-binary-logistic.html

https://blogs.sas.com/content/iml/2019/02/20/easier-calibration-plot-sas.html

Pearson chi-square/DF =1 is testing if the data is over-disperse ,has nothing to do with GOF.

I have no time to test it with classic Germany ScoreCard data. It is depended on the variables entered into model.

"1. Visual linearity of equally spaced bin WoE values is completely unrelated to the linearity of the relationship between the predictor and the target, and serves no analytic purpose. It just looks nice for story telling."

What I try to do is to make score more distinguish (-10 -5 V.S. -10 -8) ,and make better GOF ,although IV is not bigger that yours.

"2. For both prediction accuracy and rank ordering, it is nearly always better to use a metric such as: maximum IV, maximum chi-square, minimum entropy, minimum sum of squares, etc., rather than visual linearity, as a guide. In particular, for binning with binary targets, minimum entropy is equivalent to logistic regression maximum likelihood."

No. I disagree with that. you need stand on a whole model ,not just a separated variable, there are interaction effect between variables.

What I do with linearity of equally spaced bin is trying to not break assumption violation of GLM and get better GOF .

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

Re: Trying to use PROC OPTMODEL for monotonic supervised optimal binning of a continuous predictor

The 2025 SAS Hackathon has begun!