Statistical Procedures

adjn258 · Posted 06-09-2022 04:27 PM

Hi All, I have built a logistic regression model using weight of evidence transformation in SAS. My model shows good performance (in terms of discriminatory power (AUC and gini) and accuracy (Hosmer and lemeshow chi square test). However, the coefficient/estimate of one of the independent is coming out to be positive (all other coefficients, including the intercept are negative). Does this mean there is something wrong with including this factor in the model? I am being told this factor needs to be dropped because of opposite sign. Can somebody please explain if this is true? and why so?

sbxkoenk · Posted 06-10-2022 05:38 AM

Question was in 'Forecasting' board.

I have moved it to 'Statistical Procedures' board as it as about Logistic Regression and WoE predictors.

Koen

Rick_SAS · Posted 06-10-2022 06:04 AM

Whoever told you this is either confused or has some domain-specific knowledge that you have not shared. In general, a model can have some parameter estimates that are positive and others that are negative. It depends on the relationship between the response variable and the effect.

For example, in the PROC LOGISTIC documentation, the Getting Started example forms a model in which the Intercept term and an interaction effect have negative estimates, but the other estimates are positive. Other examples also produce models that have negative and positive parameter estimates.

Ksharp · Posted 06-10-2022 08:08 AM

I totally agree with @Rick_SAS 's opinion. I think your model fit very well .

"owever, the coefficient/estimate of one of the independent is coming out to be positive "

is due to your data , NOT from Logistic Model .

If you delete that variable , you will find another variable is positive , if you delete another variable ,you will find another another variable is positive . It is endless . I think your BOSS is special to business , idiot to statistical model .

P.S. maybe your boss think all negative would be more suitable to business for Credit Score Card ?

ballardw · Posted 06-12-2022 05:19 AM

@Ksharp wrote:

I totally agree with @Rick_SAS 's opinion. I think your model fit very well .

"owever, the coefficient/estimate of one of the independent is coming out to be positive "

is due to your data , NOT from Logistic Model .

P.S. maybe your boss think all negative would be more suitable to business for Credit Score Card ?

Or maybe the sign of a variable as collected is inappropriate for use this way?

"Loss" for example might be provided as positive (as in "I lost 10 dollars", the explicit value mentioned is 10) but for a model the value should be negative (-10 effect on balance)

Ksharp · Posted 06-13-2022 08:39 AM

I think OP's boss want Score Card conform to the business logistic (a.k.a have better and reasonable explanation in business).
For example :
X variable (INCOME) have positive estimator, Y variable is the probability of default .
In our mind, the more INCOME should be less the probability of default (a.k.a they have negative correlation).
wheras , INCOME have positive estimator (a.k.a they have positive correlation).
wherefore, it is very hard to explain this Score Card to customer .
Why Score Card would give us a different result with real business logistic ?

sbxkoenk · Posted 06-13-2022 08:49 AM

@Ksharp wrote:
Why Score Card would give us a different result with real business logistic ?

I guess you mean "business logic" (?).
Opposite (contra-intuitive) signs are always possible.

First check the training data and how they were collected. There you might find a reason ( aha - erlebnis ).

If training data are OK, ... I keep emphasizing, multi-collinearity is an ugly beast !

Koen

Ksharp · Posted 06-13-2022 09:07 AM

I guess you mean "business logic" (?).
Opposite (contra-intuitive) signs are always possible.

That is what I mean. But for OP ,it is very hard to explain to customer ,maybe that is reason to drop that positive variable.

adjn258 · Posted 06-12-2022 08:39 AM

Thanks you for all your inputs! i think you have it spot on! I was worried that maybe it is not possible statistically, that signs can be opposite. But it seems more of a worry for business usage from credit scoring view.

I have a further question - if one sign is opposite, how do I determine the weight of each factor in the final model? Currently, I am standardizing all factors (X-mean/std) and then rerunning the logistic equation - then I take the absolute value of the coefficients to arrive at the weights. I have a feeling this is probably not the best way, i am going wrong somewhere.

PaigeMiller · Posted 06-12-2022 08:55 AM

I have a further question - if one sign is opposite, how do I determine the weight of each factor in the final model?

Using my understanding of "weight of each factor in the model", I would say no such thing is possible.

Currently, I am standardizing all factors (X-mean/std) and then rerunning the logistic equation - then I take the absolute value of the coefficients to arrive at the weights.

This allows you to COMPARE the regression coefficients, the largest in absolute value will have the biggest impact when a 1SD change is made.

Please understand that the coefficients are not independent of each other, and the variables in your data set are correlated with each other. So there is no real concept of a variable in the data moving 1SD while the other variables are held constant — this is theoretically possible, and mathematically possible, but it does not happen in real data sets.

--
Paige Miller

adjn258 · Posted 06-12-2022 08:59 AM

Thank you. But the idea of weightage (of what I am trying to derive) is - what is the relative importance of each factor in the equation, since it will be used for scoring - can you please tell me what would be the best way to arrive at that?

adjn258 · Posted 06-12-2022 09:00 AM

Just to clarify more, that is, all other variables, being constant.

adjn258 · Posted 06-12-2022 09:09 AM

Apologies for this question - i re-read your answer, i think you ve already answered this.

sbxkoenk · Posted 06-12-2022 09:12 AM

Hello,

To determine the relative importance of each factor in the equation, you can indeed look at the (absolute value of the) standardized estimates for the parameters in the "Analysis of Maximum Likelihood Estimates" table.

But to get these standardized betas , you do not need to fit your model on standardized variables.

You can specify the STB option on the MODEL statement of PROC LOGISTIC to get these.

Also, a sign opposite to the one you expect can be caused by multicollinearity (between independent variables).
The question is : do you want your model to only predict well (in that case multicollinearity is less of a problem) or do you want your model to be a 'glass box' (focus on interpretability and explain-ability)? In the latter case, you should try to get rid of multicollinearity or reduce it to a reasonable amount.

Koen

Ksharp · Posted 06-13-2022 08:47 AM

I think CORRB option on the MODEL statement of PROC LOGISTIC could find out the multicollinearity .
As I said ,even there are no multicollinearity in MODEL ,you still could get one positive estimator ,others are negative.
It is coming from data ,not from MODEL .
Maybe OP's boss want Score Card to have more explanation in real business .that is reason OP was told to drop that positive variable.

Statistical Procedures

logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Re: logistic regression: sign of estimates/coefficients

Logistic Regression

score new data-logistic regression

A Guide to Logistic Regression in SAS

Logistic regression- restore in permanent library

Data-Driven Analytics in SAS Viya – Logistic Regression Lift and ROC C...

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...