Obsidian | Level 7

## Confusion - Interpretation of Odds Ratio

Hello All,

I am bit confused with the Odds ratio results I am getting from Proc Logistic (Logistic regression).

I am trying to predict churn (my response variable)

I have 10 + predictors. Two main predictors are Contract Status of a customers and Credit Score (1 to 5 values).

If I DO NOT add Contract_status in my predictors, the odds of a customer churning with credit_Score 5 are four times as compared to someone with credit_Score = 1 (4.7+ odds ratio). This makes absolute business sense and I expect these customers to churn at very high rate.

However, as soon as I enter Contract_Status (which has 2 levels: In Contract and Out of Contract), the odds ratio for Credit_Score changes a lot, in fact it reverses the direction. I get  .85 odds ratio for customers with credit_score 5 Vs 1 which means less odds of churning as compared to customers with 1 credit_score. This is not correct.

Am I not missing the logic?

Thanks as always.

Sachin

1 ACCEPTED SOLUTION

Accepted Solutions
Diamond | Level 26

## Re: Confusion - Interpretation of Odds Ratio

This is a problem with most regressions, when you have correlated predictor variables (which obviously you have). The effect of a variable (let's call this variable X1) can be in one direction when other variables are not in the model, but the effect of X1 can be in a different direction when those other variables are in the model.

It is called multi-collinearity. There are some possible solutions. One possible solution is stepwise regression, which I despise, and it has many known drawbacks. Other possibilities are to otherwise choose the variables that go into the model using variable clustering or principal components analysis. These have the drawback that variables are being selected not because they are good predictors, but because of other criteria. In my opinion, the best solution (in theory, anyway) is to use Logistic Partial Least Squares regression, which avoids all of these drawbacks, however in practice, the solution doesn't exist in SAS, because PROC PLS does not do a logistic version, it only works with continuous Y variables.

--
Paige Miller
3 REPLIES 3
Diamond | Level 26

## Re: Confusion - Interpretation of Odds Ratio

This is a problem with most regressions, when you have correlated predictor variables (which obviously you have). The effect of a variable (let's call this variable X1) can be in one direction when other variables are not in the model, but the effect of X1 can be in a different direction when those other variables are in the model.

It is called multi-collinearity. There are some possible solutions. One possible solution is stepwise regression, which I despise, and it has many known drawbacks. Other possibilities are to otherwise choose the variables that go into the model using variable clustering or principal components analysis. These have the drawback that variables are being selected not because they are good predictors, but because of other criteria. In my opinion, the best solution (in theory, anyway) is to use Logistic Partial Least Squares regression, which avoids all of these drawbacks, however in practice, the solution doesn't exist in SAS, because PROC PLS does not do a logistic version, it only works with continuous Y variables.

--
Paige Miller
Obsidian | Level 7

## Re: Confusion - Interpretation of Odds Ratio

Many thanks Paige and I agree with that. I have removed few highly correlated variables which has eliminated this issue to some extend. I will try to build random forest design as well and hoping multicollinearity should not be an issue there.

Thanks again.

SAS Super FREQ

## Re: Confusion - Interpretation of Odds Ratio

A more intuitive measure of the effect of a predictor in a model is its marginal effect. You might want to use the Margins macro to estimate the marginal effect for your variable in each of your models.

Discussion stats
• 3 replies
• 2028 views
• 0 likes
• 3 in conversation