BookmarkSubscribeRSS Feed
kimbekaw
Calcite | Level 5

I'm working on a project and have run into an expected issue. After running PROC LOGISTIC on my data, I noticed that a few of the odds ratios and regression coefficients seemed to be the inverse of what they should be. After some investigation using PROC FREQ to run the odds ratios, I believe there is some form of error with the odds ratios from PROC LOGISTIC.

 

The example below is of the response variable "MonthStay" and one of the variables in question "KennelCough". MonthStay = Y and the event of interest is KennelCough = N. PROC FREQ gives me the expected odds ratio, 1.7702. PROC LOGISTIC gives me the inverse 0.583 which doesn't seem correct. 

 

I don't know how to remedy this suspected error. Am I missing something in my code to get the correct calculations from PROC LOGISITC? Or am I totally misunderstanding what's going on? Thanks!

 

Here is the PROC FREQ code and result:

proc freq data = capstone.adopts_dog order = freq;
tables KennelCough*MonthStay / relrisk;
run;

procfreq.PNG

 

Here is the PROC LOGISTIC CODE and results:

proc logistic data = capstone.adopts_dog plots(only)=(roc(id=prob) effect); 

class Breed(ref='Chihuahua') Gender(ref='Female') 
Color(ref='Black') Source(ref='Stray') EvalCat(ref='TR') SNAtIn(ref='No')
FoodAggro(ref='Y') AnimalAggro(ref='Y') KennelCough(ref='Y') Dental(ref='Y') 
Fearful(ref='Y') Handling(ref='Y') UnderAge(ref='Y') InJuris(ref='Alameda County')
InRegion(ref='East Bay SPCA - Dublin') OutRegion(ref='East Bay SPCA - Dublin')
/ param=ref;

model MonthStay(event='Y') = Age Gender Breed Weight Color Source EvalCat SNatIn
NumBehvCond NumMedCond FoodAggro AnimalAggro KennelCough Dental Fearful 
Handling UnderAge Injuris InRegion OutRegion 

/ lackfit aggregate scale = none selection = backward rsquare;
output out = probdogs4 PREDPROBS=I reschi = pearson h = leverage;
run;

proclogistic_2.PNG

 

Class Level Information

proclogistic_3.PNG

 

Odds Ratios Estimates

proclogistic_1.PNG

3 REPLIES 3
Reeza
Super User

Usually that means the comparison is the inverse, ie Y vs N rather than N vs Y. To flip the direction you invert the odds ratio. 

 

Double check what you would expect by comparing the raw numbers. Ie since kennel cough is yes and dogs with month stay yes is higher when comparing to others then I would expect Y vs N to be above 1, so if you compare to Y, then the number should be less than 0.

 

But...why would you expect the odds ratio from a full logistic regression to match the output from proc freq? Once other things factored in the relationship changes. You could be seeing Simpsons Paradox as well. 

kimbekaw
Calcite | Level 5
Hm. Good point. I had thought of Simpson's Paradox, but didn't delve into it. Might need to take another look into that.

And you're right: I'm not sure why I expected the odds ratio to be the same between the two situations. In fact, I re-ran the regression with just the Kennel Cough variable and lo-and-behold I got the PROC FREQ odds ratio.

Thanks for knocking some sense into my tired brain!
ballardw
Super User

Logistic is going on only use rows in the data where all of the model variables are non-missing.

Your proc freq results shows 5979 values used to calculate RR. The logistic output shows 5785 values of Month Stay. So that is likely to have a noticeable impact on the result.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 4269 views
  • 1 like
  • 3 in conversation