Programming the statistical procedures from SAS

Different Odds Ratio from PROC FREQ & PROC LOGISTIC

Reply
New Contributor
Posts: 2

Different Odds Ratio from PROC FREQ & PROC LOGISTIC

I'm working on a project and have run into an expected issue. After running PROC LOGISTIC on my data, I noticed that a few of the odds ratios and regression coefficients seemed to be the inverse of what they should be. After some investigation using PROC FREQ to run the odds ratios, I believe there is some form of error with the odds ratios from PROC LOGISTIC.

 

The example below is of the response variable "MonthStay" and one of the variables in question "KennelCough". MonthStay = Y and the event of interest is KennelCough = N. PROC FREQ gives me the expected odds ratio, 1.7702. PROC LOGISTIC gives me the inverse 0.583 which doesn't seem correct. 

 

I don't know how to remedy this suspected error. Am I missing something in my code to get the correct calculations from PROC LOGISITC? Or am I totally misunderstanding what's going on? Thanks!

 

Here is the PROC FREQ code and result:

proc freq data = capstone.adopts_dog order = freq;
tables KennelCough*MonthStay / relrisk;
run;

procfreq.PNG

 

Here is the PROC LOGISTIC CODE and results:

proc logistic data = capstone.adopts_dog plots(only)=(roc(id=prob) effect); 

class Breed(ref='Chihuahua') Gender(ref='Female') 
Color(ref='Black') Source(ref='Stray') EvalCat(ref='TR') SNAtIn(ref='No')
FoodAggro(ref='Y') AnimalAggro(ref='Y') KennelCough(ref='Y') Dental(ref='Y') 
Fearful(ref='Y') Handling(ref='Y') UnderAge(ref='Y') InJuris(ref='Alameda County')
InRegion(ref='East Bay SPCA - Dublin') OutRegion(ref='East Bay SPCA - Dublin')
/ param=ref;

model MonthStay(event='Y') = Age Gender Breed Weight Color Source EvalCat SNatIn
NumBehvCond NumMedCond FoodAggro AnimalAggro KennelCough Dental Fearful 
Handling UnderAge Injuris InRegion OutRegion 

/ lackfit aggregate scale = none selection = backward rsquare;
output out = probdogs4 PREDPROBS=I reschi = pearson h = leverage;
run;

proclogistic_2.PNG

 

Class Level Information

proclogistic_3.PNG

 

Odds Ratios Estimates

proclogistic_1.PNG

Grand Advisor
Posts: 16,924

Re: Different Odds Ratio from PROC FREQ & PROC LOGISTIC

Usually that means the comparison is the inverse, ie Y vs N rather than N vs Y. To flip the direction you invert the odds ratio. 

 

Double check what you would expect by comparing the raw numbers. Ie since kennel cough is yes and dogs with month stay yes is higher when comparing to others then I would expect Y vs N to be above 1, so if you compare to Y, then the number should be less than 0.

 

But...why would you expect the odds ratio from a full logistic regression to match the output from proc freq? Once other things factored in the relationship changes. You could be seeing Simpsons Paradox as well. 

New Contributor
Posts: 2

Re: Different Odds Ratio from PROC FREQ & PROC LOGISTIC

Hm. Good point. I had thought of Simpson's Paradox, but didn't delve into it. Might need to take another look into that.

And you're right: I'm not sure why I expected the odds ratio to be the same between the two situations. In fact, I re-ran the regression with just the Kennel Cough variable and lo-and-behold I got the PROC FREQ odds ratio.

Thanks for knocking some sense into my tired brain!
Grand Advisor
Posts: 10,062

Re: Different Odds Ratio from PROC FREQ & PROC LOGISTIC

Logistic is going on only use rows in the data where all of the model variables are non-missing.

Your proc freq results shows 5979 values used to calculate RR. The logistic output shows 5785 values of Month Stay. So that is likely to have a noticeable impact on the result.

Ask a Question
Discussion stats
  • 3 replies
  • 101 views
  • 1 like
  • 3 in conversation