Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Confusion - Interpretation of Odds Ratio

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-08-2018 06:55 AM
(2027 views)

Hello All,

I am bit confused with the Odds ratio results I am getting from Proc Logistic (Logistic regression).

I am trying to predict churn (my response variable)

I have 10 + predictors. Two main predictors are Contract Status of a customers and Credit Score (1 to 5 values).

If I DO NOT add Contract_status in my predictors, the odds of a customer churning with credit_Score 5 are four times as compared to someone with credit_Score = 1 (4.7+ odds ratio). This makes absolute business sense and I expect these customers to churn at very high rate.

However, as soon as I enter Contract_Status (which has 2 levels: In Contract and Out of Contract), the odds ratio for Credit_Score changes a lot, in fact it reverses the direction. I get .85 odds ratio for customers with credit_score 5 Vs 1 which means less odds of churning as compared to customers with 1 credit_score. This is not correct.

Am I not missing the logic?

Thanks as always.

Sachin

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is a problem with most regressions, when you have correlated predictor variables (which obviously you have). The effect of a variable (let's call this variable X1) can be in one direction when other variables are not in the model, but the effect of X1 can be in a different direction when those other variables are in the model.

It is called multi-collinearity. There are some possible solutions. One possible solution is stepwise regression, which I despise, and it has many known drawbacks. Other possibilities are to otherwise choose the variables that go into the model using variable clustering or principal components analysis. These have the drawback that variables are being selected not because they are good predictors, but because of other criteria. In my opinion, the best solution (in theory, anyway) is to use Logistic Partial Least Squares regression, which avoids all of these drawbacks, however in practice, the solution doesn't exist in SAS, because PROC PLS does not do a logistic version, it only works with continuous Y variables.

--

Paige Miller

Paige Miller

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is a problem with most regressions, when you have correlated predictor variables (which obviously you have). The effect of a variable (let's call this variable X1) can be in one direction when other variables are not in the model, but the effect of X1 can be in a different direction when those other variables are in the model.

It is called multi-collinearity. There are some possible solutions. One possible solution is stepwise regression, which I despise, and it has many known drawbacks. Other possibilities are to otherwise choose the variables that go into the model using variable clustering or principal components analysis. These have the drawback that variables are being selected not because they are good predictors, but because of other criteria. In my opinion, the best solution (in theory, anyway) is to use Logistic Partial Least Squares regression, which avoids all of these drawbacks, however in practice, the solution doesn't exist in SAS, because PROC PLS does not do a logistic version, it only works with continuous Y variables.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Many thanks Paige and I agree with that. I have removed few highly correlated variables which has eliminated this issue to some extend. I will try to build random forest design as well and hoping multicollinearity should not be an issue there.

Thanks again.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Tags:
- marginal effects

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.