03-14-2017 12:52 AM
I'm working on a project and have run into an expected issue. After running PROC LOGISTIC on my data, I noticed that a few of the odds ratios and regression coefficients seemed to be the inverse of what they should be. After some investigation using PROC FREQ to run the odds ratios, I believe there is some form of error with the odds ratios from PROC LOGISTIC.
The example below is of the response variable "MonthStay" and one of the variables in question "KennelCough". MonthStay = Y and the event of interest is KennelCough = N. PROC FREQ gives me the expected odds ratio, 1.7702. PROC LOGISTIC gives me the inverse 0.583 which doesn't seem correct.
I don't know how to remedy this suspected error. Am I missing something in my code to get the correct calculations from PROC LOGISITC? Or am I totally misunderstanding what's going on? Thanks!
Here is the PROC FREQ code and result:
proc freq data = capstone.adopts_dog order = freq; tables KennelCough*MonthStay / relrisk; run;
Here is the PROC LOGISTIC CODE and results:
proc logistic data = capstone.adopts_dog plots(only)=(roc(id=prob) effect); class Breed(ref='Chihuahua') Gender(ref='Female') Color(ref='Black') Source(ref='Stray') EvalCat(ref='TR') SNAtIn(ref='No') FoodAggro(ref='Y') AnimalAggro(ref='Y') KennelCough(ref='Y') Dental(ref='Y') Fearful(ref='Y') Handling(ref='Y') UnderAge(ref='Y') InJuris(ref='Alameda County') InRegion(ref='East Bay SPCA - Dublin') OutRegion(ref='East Bay SPCA - Dublin') / param=ref; model MonthStay(event='Y') = Age Gender Breed Weight Color Source EvalCat SNatIn NumBehvCond NumMedCond FoodAggro AnimalAggro KennelCough Dental Fearful Handling UnderAge Injuris InRegion OutRegion / lackfit aggregate scale = none selection = backward rsquare; output out = probdogs4 PREDPROBS=I reschi = pearson h = leverage; run;
Class Level Information
Odds Ratios Estimates
03-14-2017 01:25 AM
Usually that means the comparison is the inverse, ie Y vs N rather than N vs Y. To flip the direction you invert the odds ratio.
Double check what you would expect by comparing the raw numbers. Ie since kennel cough is yes and dogs with month stay yes is higher when comparing to others then I would expect Y vs N to be above 1, so if you compare to Y, then the number should be less than 0.
But...why would you expect the odds ratio from a full logistic regression to match the output from proc freq? Once other things factored in the relationship changes. You could be seeing Simpsons Paradox as well.
03-14-2017 02:19 AM
03-14-2017 10:21 AM
Logistic is going on only use rows in the data where all of the model variables are non-missing.
Your proc freq results shows 5979 values used to calculate RR. The logistic output shows 5785 values of Month Stay. So that is likely to have a noticeable impact on the result.