turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Different Odds Ratio from PROC FREQ & PROC LOGISTI...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 12:52 AM

I'm working on a project and have run into an expected issue. After running PROC LOGISTIC on my data, I noticed that a few of the odds ratios and regression coefficients seemed to be the inverse of what they *should be*. After some investigation using PROC FREQ to run the odds ratios, I believe there is some form of error with the odds ratios from PROC LOGISTIC.

The example below is of the response variable "MonthStay" and one of the variables in question "KennelCough". **MonthStay** = Y and the event of interest is **KennelCough** = N. PROC FREQ gives me the expected odds ratio, 1.7702. PROC LOGISTIC gives me the inverse 0.583 which doesn't seem correct.

I don't know how to remedy this suspected error. Am I missing something in my code to get the correct calculations from PROC LOGISITC? Or am I totally misunderstanding what's going on? Thanks!

Here is the PROC FREQ code and result:

```
proc freq data = capstone.adopts_dog order = freq;
tables KennelCough*MonthStay / relrisk;
run;
```

Here is the PROC LOGISTIC CODE and results:

proc logistic data = capstone.adopts_dog plots(only)=(roc(id=prob) effect); class Breed(ref='Chihuahua') Gender(ref='Female') Color(ref='Black') Source(ref='Stray') EvalCat(ref='TR') SNAtIn(ref='No') FoodAggro(ref='Y') AnimalAggro(ref='Y') KennelCough(ref='Y') Dental(ref='Y') Fearful(ref='Y') Handling(ref='Y') UnderAge(ref='Y') InJuris(ref='Alameda County') InRegion(ref='East Bay SPCA - Dublin') OutRegion(ref='East Bay SPCA - Dublin') / param=ref; model MonthStay(event='Y') = Age Gender Breed Weight Color Source EvalCat SNatIn NumBehvCond NumMedCond FoodAggro AnimalAggro KennelCough Dental Fearful Handling UnderAge Injuris InRegion OutRegion / lackfit aggregate scale = none selection = backward rsquare; output out = probdogs4 PREDPROBS=I reschi = pearson h = leverage; run;

**Class Level Information**

**Odds Ratios Estimates**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 01:25 AM

Usually that means the comparison is the inverse, ie Y vs N rather than N vs Y. To flip the direction you invert the odds ratio.

Double check what you would expect by comparing the raw numbers. Ie since kennel cough is yes and dogs with month stay yes is higher when comparing to others then I would expect Y vs N to be above 1, so if you compare to Y, then the number should be less than 0.

But...why would you expect the odds ratio from a full logistic regression to match the output from proc freq? Once other things factored in the relationship changes. You could be seeing Simpsons Paradox as well.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 02:19 AM

Hm. Good point. I had thought of Simpson's Paradox, but didn't delve into it. Might need to take another look into that.

And you're right: I'm not sure why I expected the odds ratio to be the same between the two situations. In fact, I re-ran the regression with just the Kennel Cough variable and lo-and-behold I got the PROC FREQ odds ratio.

Thanks for knocking some sense into my tired brain!

And you're right: I'm not sure why I expected the odds ratio to be the same between the two situations. In fact, I re-ran the regression with just the Kennel Cough variable and lo-and-behold I got the PROC FREQ odds ratio.

Thanks for knocking some sense into my tired brain!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 10:21 AM

Logistic is going on only use rows in the data where all of the model variables are non-missing.

Your proc freq results shows 5979 values used to calculate RR. The logistic output shows 5785 values of Month Stay. So that is likely to have a noticeable impact on the result.