I'm running a logistic regression model with main effects and two interaction terms. Both interaction terms in the model (ageten*education and ageten*homeown) were significant (p<0.05) and I wanted to have stratified odds ratios with one reference group (40-49 & post-secondary education, 40-49 & owned home) for each interaction.
Code 1:
proc surveylogistic data=survey.family; /* model with both interactions */
weight wt;
class ageten (ref='40-49') sex (ref='Male') state (ref='Florida')
income (ref='Above LIM') homeown (ref='Owned') education (ref='Post-secondary')
race (ref='White') immigration (ref='Non-immigrant')/ param = glm;
model outcome(event='A') = ageten sex state income homeown education race immigration age*education age*homeown/ expb clodds nodummyprint;
run;
However, when I include expb in the code 1, I get odds ratios where the reference group is different for each stratum. For example, the reference group for '10-19 year old living in a rented home' is '10-19 year old living in owned home' and the reference group for '20-29 year old living in rented home' is '20-29 year old living in owned home'.
I've also tried combining the variables involved in the interaction into a single variable (shown in code 2 below), however there seems to be problems with multicollinearity if age is included in both interactions. I was able to get the stratified odds with only one reference group in a model controlling for other sociodemographic characteristics, but I had to have a separate model for each interaction/combined variable. Also, I assume that doing it this way means I wouldn't be able to include the two individual variables in the model.
Code 2:
/* Ageten_homeown interaction coding */
data survey.family;
set survey.family;
if ageten=4 and homeown=1 then agehome=0;
else if ageten=1 and homeown=1 then agehome=1;
else if ageten=2 and homeown=1 then agehome=2;
else if ageten=3 and homeown=1 then agehome=3;
else if ageten=1 and homeown=2 then agehome=4;
else if ageten=2 and homeown=2 then agehome=5;
else if ageten=3 and homeown=2 then agehome=6;
else if ageten=4 and homeown=2 then agehome=7;
run;
proc format;
value agehome 0='40-49 owned' 1='10-19 owned' 2='20-29 owned'
3='30-39 owned' 4= '10-19 rented' 5='20-29 rented' 6='30-39 rented' 7='40-49 rented';
run;
proc surveylogistic data=survey.family;
class agehome (ref='40-49 owned') sex (ref='Male') state (ref='Florida') education (ref='Post-secondary')
income (ref='Above LIM') race (ref='White') immigration (ref='Non-immigrant')/ param=ref;
model outcome(event='A')= agehome sex state education income race immigration;
weight wt;
title 'Interaction ageten*homeownership on outcome A in adjusted model';
format agehome agehome.;
run;
What I would like to generate are odds ratios for each combination compared to one single reference group (like 40-49 year old living in owned home). I'm hoping to get a table of odds ratios that looks like this for both significant interaction terms included in the same model that's adjusted for other characteristics:
Is there a way to produce odds ratios like that in the surveylogistic procedure? If not, is code 2 used for two models the best way to approach this? Thanks.
This can done with the LSMEANS or LSMESTIMATE statement. The easiest way is using the LSMEANS statement with the DIFF and ODDSRATIO options. The following statement will produce all of the pairwise odds ratios among the combinations of levels of the two predictors. You just need to pick out the ones you want among the entire set.
lsmeans ageten*homeown / diff oddsratio;
If you just want to limit it to just the specific ones you want, then use LSMESTIMATE statements to compute those specific ones. The coefficients in each statement match the combinations as ordered in the output from the LSMEANS statement. These statements would compute odds ratios comparing two particular combinations:
lsmestimate diagnosis*treatment '1vs6' 1 0 0 0 0 -1 / exp;
lsmestimate diagnosis*treatment '2vs6' 0 1 0 0 0 -1 / exp;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.