Hello,
I ran a logistic regression model assessing the interaction between distance (a continuous variable) and SES (a categorical variable with two levels). I then wanted to calculate the predicted probabilities of the outcome for each level of the interaction. However, I understand that this cannot be done with a continuous variable in the interaction term. Thus, I used the lsmeans command to evaluate the predicted probability of each level of the interaction term at specific values of distance. My question is regarding the interpretation of the output. My interaction term was significant in my original logistic regression model such that the interaction between distance and the second level of SES were significantly associated with the outcome when the interaction between distance and the first level was the reference. However, once I calculated the predicted probabilities, these values were not significantly different from one another. Does this mean that overall, as distance changes, there is an association with the outcome among those in the second SES group, but that at these specific values of distance, there is no difference between the two groups?
Thank you.
proc logistic data="H:\desktop\DataAnalysis" descending;
class SES (ref='1') /param=ref;
model stage = SES | distance ;
where sample = 1;
run;
proc logistic data = "H:\desktop\DataAnalysis" descending;
class SES (ref='1') /param=glm;
model stage = SES | distance ;
lsmeans SES/ at distance=2.67 ilink or cl diff;
lsmeans SES/ at distance=7.60 ilink or cl diff;
lsmeans SES/ at distance=24.51 ilink or cl diff;
where sample = 1;
run;
I then wanted to calculate the predicted probabilities of the outcome for each level of the interaction. However, I understand that this cannot be done with a continuous variable in the interaction term.
This is not correct. Predicted values can be created from any model. If you are using PROC LOGISTIC, you can do this using the OUTPUT statement, and several other ways. See:
https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html
You will have to specify values of the continuous variable to predict at, as you tried to do using LSMEANS. You could also use the EFFECTPLOT statement to draw a picture of the interaction. To answer your question about the LSMEANS: in this case, the LSMEANS would be on the lines shown in the plot of the Interaction.
Thank you for your reply. Would the predicted probabilities produced here be different than those I've obtained with my code?
Please provide more information. Please explain what you did. Please provide code. Please provide output.
The predicted values that SAS produces, I assume, are correct. If you are obtaining different predicted values, I would assume you have made a mistake somewhere.
The significant interaction tells you that the difference in the SES groups changes depending on the distance, or equivalently that the effect of distance depends SES - think of the slope of the two distance curves for the SES levels at each of various values of distance. But note that you are talking about the SES difference or the distance effect in terms of the log odds, not the event probability. This is discussed in more detail in this note. See, in particular, the Poisson model example on rates and the use of the Margins macro at the end which compares the event probabilities at selected values of the continuous predictor (as well as compares slopes). A similar approach can be used for your logistic model. Note also the use of the EFFECTPLOT statement to plot the predicted log odds or predicted probabilities.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.