BookmarkSubscribeRSS Feed
adjn258
Calcite | Level 5

Hi All,   I have built a logistic regression model using weight of evidence transformation in SAS.   My model shows good performance (in terms of discriminatory power (AUC and gini) and accuracy (Hosmer and lemeshow chi square test).   However, the coefficient/estimate of one of the independent is coming out to be positive (all other coefficients, including the intercept are negative). Does this mean there is something wrong with including this factor in the model? I am being told this factor needs to be dropped because of opposite sign. Can somebody please explain if this is true? and why so? 

15 REPLIES 15
sbxkoenk
SAS Super FREQ

Question was in 'Forecasting' board.

I have moved it to 'Statistical Procedures' board as it as about Logistic Regression and WoE predictors.

 

Koen

Rick_SAS
SAS Super FREQ

Whoever told you this is either confused or has some domain-specific knowledge that you have not shared. In general, a model can have some parameter estimates that are positive and others that are negative. It depends on the relationship between the response variable and the effect. 

 

For example, in the PROC LOGISTIC documentation, the Getting Started example forms a model in which the Intercept term and an interaction effect have negative estimates, but the other estimates are positive. Other examples also produce models that have negative and positive parameter estimates.

Ksharp
Super User

I totally agree with @Rick_SAS  's opinion.  I think your model fit very well .

"owever, the coefficient/estimate of one of the independent is coming out to be positive "

is due to your data , NOT from Logistic Model .

 

If you delete that variable , you will find another variable is positive , if you delete another variable ,you will find another another variable is positive . It is endless . I think your BOSS is special to business , idiot to statistical model .

 

P.S. maybe your boss think all negative would be more suitable to business for Credit Score Card ?

ballardw
Super User

@Ksharp wrote:

I totally agree with @Rick_SAS  's opinion.  I think your model fit very well .

"owever, the coefficient/estimate of one of the independent is coming out to be positive "

is due to your data , NOT from Logistic Model .

 

 

P.S. maybe your boss think all negative would be more suitable to business for Credit Score Card ?


Or maybe the sign of a variable as collected is inappropriate for use this way?

 

"Loss" for example might be provided as positive (as in "I lost 10 dollars", the explicit value mentioned is 10) but for a model the value should be negative (-10 effect on balance)

Ksharp
Super User
I think OP's boss want Score Card conform to the business logistic (a.k.a have better and reasonable explanation in business).
For example :
X variable (INCOME) have positive estimator, Y variable is the probability of default .
In our mind, the more INCOME should be less the probability of default (a.k.a they have negative correlation).
wheras , INCOME have positive estimator (a.k.a they have positive correlation).
wherefore, it is very hard to explain this Score Card to customer .
Why Score Card would give us a different result with real business logistic ?
sbxkoenk
SAS Super FREQ

@Ksharp wrote:
Why Score Card would give us a different result with real business logistic ?

I guess you mean "business logic" (?).
Opposite (contra-intuitive) signs are always possible.

First check the training data and how they were collected. There you might find a reason ( aha - erlebnis ).

 

If training data are OK, ... I keep emphasizing, multi-collinearity is an ugly beast !

 

Koen

Ksharp
Super User

I guess you mean "business logic" (?).
Opposite (contra-intuitive) signs are always possible.

That is what I mean. But for OP ,it is very hard to explain to customer ,maybe that is reason to drop that positive variable.

adjn258
Calcite | Level 5

Thanks you for all your inputs! i think you have it spot on! I was worried that maybe it is not possible statistically, that signs can be opposite. But it seems more of a worry for business usage from credit scoring view.

 

I have a further question -  if one sign is opposite, how do I determine the weight of each factor in the final model? Currently, I am standardizing all factors (X-mean/std) and then rerunning the logistic equation - then I take the absolute value of the coefficients to arrive at the weights. I have a feeling this is probably not the best way, i am going wrong somewhere.

PaigeMiller
Diamond | Level 26

I have a further question - if one sign is opposite, how do I determine the weight of each factor in the final model?

 

Using my understanding of "weight of each factor in the model", I would say no such thing is possible.

 

 

Currently, I am standardizing all factors (X-mean/std) and then rerunning the logistic equation - then I take the absolute value of the coefficients to arrive at the weights.

 

This allows you to COMPARE the regression coefficients, the largest in absolute value will have the biggest impact when a 1SD change is made.

 

Please understand that the coefficients are not independent of each other, and the variables in your data set are correlated with each other. So there is no real concept of a variable in the data moving 1SD while the other variables are held constant — this is theoretically possible, and mathematically possible, but it does not happen in real data sets.

--
Paige Miller
adjn258
Calcite | Level 5

Thank you. But the idea of weightage (of what I am trying to derive) is - what is the relative importance of each factor in the equation, since it will be used for scoring - can you please tell me what would be the best way to arrive at that?

adjn258
Calcite | Level 5
Just to clarify more, that is, all other variables, being constant.
adjn258
Calcite | Level 5
Apologies for this question - i re-read your answer, i think you ve already answered this.
sbxkoenk
SAS Super FREQ

Hello,

 

To determine the relative importance of each factor in the equation, you can indeed look at the (absolute value of the) standardized estimates for the parameters in the "Analysis of Maximum Likelihood Estimates" table.

 

But to get these standardized betas , you do not need to fit your model on standardized variables.

You can specify the STB option on the MODEL statement of PROC LOGISTIC to get these.

 

Also, a sign opposite to the one you expect can be caused by multicollinearity (between independent variables).
The question is : do you want your model to only predict well (in that case multicollinearity is less of a problem) or do you want your model to be a 'glass box' (focus on interpretability and explain-ability)? In the latter case, you should try to get rid of multicollinearity or reduce it to a reasonable amount.

Koen

Ksharp
Super User
I think CORRB option on the MODEL statement of PROC LOGISTIC could find out the multicollinearity .
As I said ,even there are no multicollinearity in MODEL ,you still could get one positive estimator ,others are negative.
It is coming from data ,not from MODEL .
Maybe OP's boss want Score Card to have more explanation in real business .that is reason OP was told to drop that positive variable.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 3086 views
  • 0 likes
  • 6 in conversation