proc surveylogistic data=data;
class TARGET b c;
model TARGET (event='1') = a b c d e f g / clparm;
strata STRATA;
cluster PSU;
weight WEIGHT;
run;
proc surveyfreq data=data;
strata STRATA;
cluster PSU;
weight WEIGHT;
tables (a b c d e f g)*TARGET / RelRisk clparm;
run;
I am interested in the odds ratio of variable b has on TARGET. I see different odds ratios for proc surveylogistic and proc survey freq, and manually proc surveyfreq makes sense when I take the weighted values. Why am I seeing different odds ratios and what can I do to fix? At least can I see relative risk in proc surveyfreq as that's the model I'm using.
Without your data it is hard to tell exactly.
On possible cause is that Surveyfreq and Surveylogistic will treat missing values a bit differently. If any variable on the model statement is missing (unless the MISSING option is included on a the Class statement) then the entire record is not used for modeling (pretty common to most of the modeling procedures). Read the diagnostics about how many records are in the data set and how many actually used for the model.
Ok I think the key is to drop all missing values before running because the proc surveyfreq only accounts for the only missing values of b and TARGET, rather than the other ones droped by the regression.
Beyond the issue of missing values, the results will still differ since the odds ratio estimates for any variable provided by SURVEYLOGISTIC are adjusted for the effects of the other variables in the model. The estimates from SURVEYFREQ are not adjusted for the other variables.
After accounting for missing this must be the reason. Any way I can still get the relative risk in the the proc surveylogistic statement, since I want to account for these interactions?
Use the STORE, LSMEANS, and ODS OUTPUT statements in SURVEYLOGISTIC followed by the NLMeans macro as illustrated (using PROC LOGISTIC) in this note.
I really, really, really wish that this macro had been around back when I first used PROC LOGISTIC and GENMOD. I was happily including ORs in stuff that went to study management, but they wanted everything expressed as relative risk, since that is what PROC FREQ generates and that is what they were used to.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.