BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
RiskViya
Fluorite | Level 6

SAS Like H2o, there is non_negative options in H2O (H2OGeneralizedLinearEstimator) can make it, or Glum (Python Package) can control coefficient with lower bound or upper bound. When we model risk scorecard, all the parameter coefficients are the same direction (all negative or positive) after variable woe.

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

You could use the HPGENSELECT Procedure and RESTRICT statement therein.

The RESTRICT statement enables you to specify linear equality or inequality constraints among the parameters of a model. These restrictions are incorporated into the maximum likelihood analysis.

 

You might expect all WOE-variables to have the same sign (based on bi-variate relationship between target and input) , but due to multi-collinearity ( f.e. ) in the full model one or two WOE-variables can still get a sign opposite to the one you might expect.

See also here (although this is PROC REG) :
Restricted least squares regression in SAS
By Rick Wicklin on The DO Loop September 16, 2020
https://blogs.sas.com/content/iml/2020/09/16/restricted-regression-sas.html

 

Koen

View solution in original post

10 REPLIES 10
Ksharp
Super User
“When we model risk scorecard, all the parameter coefficients are the same direction (all negative or positive) after variable woe.”

That is NOT true or right. @Rick_SAS discussed this topic at one of his blogs.
RiskViya
Fluorite | Level 6

If the original variable is used for glm regression, the positive and negative reaction variables of the sign are related to the target variable, but after WOE coding, the correlation symbol should be consistent, and the review of the model by the strategy is also more acceptable to the model with the same parameter symbol.

MikaellaK
Calcite | Level 5

Hello,

 

I have a similar query. As part of a scorecard development, I have transformed all explanatory variables using WOE and established a monotonic relationship (increasing or decreasing) of each variable with the response variable. Should all coefficients be negative when I apply a logistic regression to the transformed explanatory variables?

PaigeMiller
Diamond | Level 26

@MikaellaK wrote:

Hello,

 

I have a similar query. As part of a scorecard development, I have transformed all explanatory variables using WOE and established a monotonic relationship (increasing or decreasing) of each variable with the response variable. Should all coefficients be negative when I apply a logistic regression to the transformed explanatory variables?


Does not the above discussion answer this question?

--
Paige Miller
Ksharp
Super User

No. Absolutely not .

@Rick_SAS wrote a wonderful blog "Simpson's Paradox" to explain this problem.

https://blogs.sas.com/content/iml/2023/03/27/simpsons-paradox.html

 

 

Ksharp_1-1680870443895.png

Suppose X stands for income per month, Y stands for the probability of default ,

From all the data, you could see the linear decreasing relationship between X and Y.

But if you take into account AGE variable, you would see the reverse result . Surprise ?

So you can't constraint all the parameter to be positive or negative .

And Rick also pointed out that would reduce the accuracy of model's prediction.

 

 

PaigeMiller
Diamond | Level 26

Adding to the comments from @Ksharp 

 

Let's suppose you have at least these two variables in your scorecard model: FICO and number of delinquencies in last 24 months. These have opposite effects ... as FICO goes up, the score should go up; and number of delinquencies in last 24 months goes up, the score should go down. The coefficients in the model for these two variables SHOULD have opposite signs; there is nothing wrong with this. In fact, restricting the coefficients to have the same signs would make the model fit worse and cause the scores to do illogical things.

--
Paige Miller
RiskViya
Fluorite | Level 6
The variables are monotonically divided into bins and WOE encoded, which is consistent with the correlation of the target variable. Therefore,  The case you mentioned above is the original variable, and the scorecard was developed using the WOE-encoded variable.
sbxkoenk
SAS Super FREQ

Hello,

 

You could use the HPGENSELECT Procedure and RESTRICT statement therein.

The RESTRICT statement enables you to specify linear equality or inequality constraints among the parameters of a model. These restrictions are incorporated into the maximum likelihood analysis.

 

You might expect all WOE-variables to have the same sign (based on bi-variate relationship between target and input) , but due to multi-collinearity ( f.e. ) in the full model one or two WOE-variables can still get a sign opposite to the one you might expect.

See also here (although this is PROC REG) :
Restricted least squares regression in SAS
By Rick Wicklin on The DO Loop September 16, 2020
https://blogs.sas.com/content/iml/2020/09/16/restricted-regression-sas.html

 

Koen

Ksharp
Super User
1)"but after WOE coding, the correlation symbol should be consistent"
Why do you think that would happen? Either you use original variable
or WOE variable to build a model ,they are all under general LINEAR model.
Noticed that they both fit LINEAR effect not non-linear effect.

2)"the review of the model by the strategy is also more acceptable to the model with the same parameter symbol."
Maybe you are right.But
That is for business thing(i.e. could get better explanation for scorecard,better fit business rule),
NOT for statistical thing/theory .
In statistical theory there is no need to constrain coefficient to be positive or negative.

3)"The variables are monotonically divided into bins and WOE encoded, which is consistent with the correlation of the target variable."
They are the same thing,Both fit LINEAR effect as you said monotonically between X and Y.
And even worse, after you bin original variable into WOE,you lost more information than original variable,
That why @Rick_SAS suggest to use original variable to build model not WOE.
But in Scorecard to use WOE could make a better explanation than original variable.
And that is not reason you think estimated coefficient all should be positive or negative.
And if you check the SAS documentation of Scorecard ,
there is also an example which include positive and negative both .
Base your opinion ,SAS documentation is WRONG ?

P.S I totall agree with Paige's opinion.


4)"The variables are monotonically divided into bins and WOE encoded, which is consistent with the correlation of the target variable."
But under other variables/WOE influence , the correlation could reverse(a.k.a positive become negative).
Rick_SAS
SAS Super FREQ

With respect, I have never made any public statements about building scorecards or using WOE. KSharp, when you want to encourage an OP to read something that I wrote, it would be good to provide a link so that the OP can read exactly what I said, including the context in which I said it.

 

I encourage the OP to think about the comments from Paige and others who have pointed out that the parameters in a regression model should not be artificially constrained without a good reason. Doing so can reduce the accuracy of the model's predictions.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 5116 views
  • 2 likes
  • 6 in conversation