BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sachin01663
Obsidian | Level 7

Hello All,

 

I am bit confused with the Odds ratio results I am getting from Proc Logistic (Logistic regression).

I am trying to predict churn (my response variable)

I have 10 + predictors. Two main predictors are Contract Status of a customers and Credit Score (1 to 5 values). 

 

If I DO NOT add Contract_status in my predictors, the odds of a customer churning with credit_Score 5 are four times as compared to someone with credit_Score = 1 (4.7+ odds ratio). This makes absolute business sense and I expect these customers to churn at very high rate. 

 

However, as soon as I enter Contract_Status (which has 2 levels: In Contract and Out of Contract), the odds ratio for Credit_Score changes a lot, in fact it reverses the direction. I get  .85 odds ratio for customers with credit_score 5 Vs 1 which means less odds of churning as compared to customers with 1 credit_score. This is not correct. 

 

Am I not missing the logic? 

 

Thanks as always. 

Sachin

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

This is a problem with most regressions, when you have correlated predictor variables (which obviously you have). The effect of a variable (let's call this variable X1) can be in one direction when other variables are not in the model, but the effect of X1 can be in a different direction when those other variables are in the model.

 

It is called multi-collinearity. There are some possible solutions. One possible solution is stepwise regression, which I despise, and it has many known drawbacks. Other possibilities are to otherwise choose the variables that go into the model using variable clustering or principal components analysis. These have the drawback that variables are being selected not because they are good predictors, but because of other criteria. In my opinion, the best solution (in theory, anyway) is to use Logistic Partial Least Squares regression, which avoids all of these drawbacks, however in practice, the solution doesn't exist in SAS, because PROC PLS does not do a logistic version, it only works with continuous Y variables.

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

This is a problem with most regressions, when you have correlated predictor variables (which obviously you have). The effect of a variable (let's call this variable X1) can be in one direction when other variables are not in the model, but the effect of X1 can be in a different direction when those other variables are in the model.

 

It is called multi-collinearity. There are some possible solutions. One possible solution is stepwise regression, which I despise, and it has many known drawbacks. Other possibilities are to otherwise choose the variables that go into the model using variable clustering or principal components analysis. These have the drawback that variables are being selected not because they are good predictors, but because of other criteria. In my opinion, the best solution (in theory, anyway) is to use Logistic Partial Least Squares regression, which avoids all of these drawbacks, however in practice, the solution doesn't exist in SAS, because PROC PLS does not do a logistic version, it only works with continuous Y variables.

--
Paige Miller
sachin01663
Obsidian | Level 7

Many thanks Paige and I agree with that. I have removed few highly correlated variables which has eliminated this issue to some extend. I will try to build random forest design as well and hoping multicollinearity should not be an issue there. 

 

Thanks again. 

StatDave
SAS Super FREQ

A more intuitive measure of the effect of a predictor in a model is its marginal effect. You might want to use the Margins macro to estimate the marginal effect for your variable in each of your models. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2028 views
  • 0 likes
  • 3 in conversation