- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All, I have built a logistic regression model using weight of evidence transformation in SAS. My model shows good performance (in terms of discriminatory power (AUC and gini) and accuracy (Hosmer and lemeshow chi square test). However, the coefficient/estimate of one of the independent is coming out to be positive (all other coefficients, including the intercept are negative). Does this mean there is something wrong with including this factor in the model? I am being told this factor needs to be dropped because of opposite sign. Can somebody please explain if this is true? and why so?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Question was in 'Forecasting' board.
I have moved it to 'Statistical Procedures' board as it as about Logistic Regression and WoE predictors.
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Whoever told you this is either confused or has some domain-specific knowledge that you have not shared. In general, a model can have some parameter estimates that are positive and others that are negative. It depends on the relationship between the response variable and the effect.
For example, in the PROC LOGISTIC documentation, the Getting Started example forms a model in which the Intercept term and an interaction effect have negative estimates, but the other estimates are positive. Other examples also produce models that have negative and positive parameter estimates.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I totally agree with @Rick_SAS 's opinion. I think your model fit very well .
"owever, the coefficient/estimate of one of the independent is coming out to be positive "
is due to your data , NOT from Logistic Model .
If you delete that variable , you will find another variable is positive , if you delete another variable ,you will find another another variable is positive . It is endless . I think your BOSS is special to business , idiot to statistical model .
P.S. maybe your boss think all negative would be more suitable to business for Credit Score Card ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp wrote:
I totally agree with @Rick_SAS 's opinion. I think your model fit very well .
"owever, the coefficient/estimate of one of the independent is coming out to be positive "
is due to your data , NOT from Logistic Model .
P.S. maybe your boss think all negative would be more suitable to business for Credit Score Card ?
Or maybe the sign of a variable as collected is inappropriate for use this way?
"Loss" for example might be provided as positive (as in "I lost 10 dollars", the explicit value mentioned is 10) but for a model the value should be negative (-10 effect on balance)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For example :
X variable (INCOME) have positive estimator, Y variable is the probability of default .
In our mind, the more INCOME should be less the probability of default (a.k.a they have negative correlation).
wheras , INCOME have positive estimator (a.k.a they have positive correlation).
wherefore, it is very hard to explain this Score Card to customer .
Why Score Card would give us a different result with real business logistic ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp wrote:
Why Score Card would give us a different result with real business logistic ?
I guess you mean "business logic" (?).
Opposite (contra-intuitive) signs are always possible.
First check the training data and how they were collected. There you might find a reason ( aha - erlebnis ).
If training data are OK, ... I keep emphasizing, multi-collinearity is an ugly beast !
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I guess you mean "business logic" (?).
Opposite (contra-intuitive) signs are always possible.
That is what I mean. But for OP ,it is very hard to explain to customer ,maybe that is reason to drop that positive variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks you for all your inputs! i think you have it spot on! I was worried that maybe it is not possible statistically, that signs can be opposite. But it seems more of a worry for business usage from credit scoring view.
I have a further question - if one sign is opposite, how do I determine the weight of each factor in the final model? Currently, I am standardizing all factors (X-mean/std) and then rerunning the logistic equation - then I take the absolute value of the coefficients to arrive at the weights. I have a feeling this is probably not the best way, i am going wrong somewhere.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a further question - if one sign is opposite, how do I determine the weight of each factor in the final model?
Using my understanding of "weight of each factor in the model", I would say no such thing is possible.
Currently, I am standardizing all factors (X-mean/std) and then rerunning the logistic equation - then I take the absolute value of the coefficients to arrive at the weights.
This allows you to COMPARE the regression coefficients, the largest in absolute value will have the biggest impact when a 1SD change is made.
Please understand that the coefficients are not independent of each other, and the variables in your data set are correlated with each other. So there is no real concept of a variable in the data moving 1SD while the other variables are held constant — this is theoretically possible, and mathematically possible, but it does not happen in real data sets.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. But the idea of weightage (of what I am trying to derive) is - what is the relative importance of each factor in the equation, since it will be used for scoring - can you please tell me what would be the best way to arrive at that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
To determine the relative importance of each factor in the equation, you can indeed look at the (absolute value of the) standardized estimates for the parameters in the "Analysis of Maximum Likelihood Estimates" table.
But to get these standardized betas , you do not need to fit your model on standardized variables.
You can specify the STB option on the MODEL statement of PROC LOGISTIC to get these.
Also, a sign opposite to the one you expect can be caused by multicollinearity (between independent variables).
The question is : do you want your model to only predict well (in that case multicollinearity is less of a problem) or do you want your model to be a 'glass box' (focus on interpretability and explain-ability)? In the latter case, you should try to get rid of multicollinearity or reduce it to a reasonable amount.
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As I said ,even there are no multicollinearity in MODEL ,you still could get one positive estimator ,others are negative.
It is coming from data ,not from MODEL .
Maybe OP's boss want Score Card to have more explanation in real business .that is reason OP was told to drop that positive variable.