Hello,
According to https://support.sas.com/kb/46/997.html, PROC GENMOD can be used to fit a repeated measures logistic model to the individual level data.
In the example, 100 husband and wife pairs were asked a question that could be answered Yes or No. We are to estimate the probability difference between husband and wife.
member = 1 if husband, 0 if wife
id: the husband and wife identifier
The model can be fitted via logit link or identity link. If using logit link, the probability difference can get from %NLMeans(instore=geemod, coef=c, link=logit, title=Pr(Yes) Difference).
logit link:
proc genmod data=indiv; class id member; model response(event='1') = member / dist=binomial; repeated subject=id; lsmeans member / ilink cl e; ods output coef=c; store geemod; run;
%NLMeans(instore=geemod, coef=c, link=logit, title=Pr(Yes) Difference)
Identity link:
proc genmod data=indiv; class id member; model response(event='1') = member / dist=binomial link=identity; repeated subject=id; lsmeans member / diff cl; run;
Comparing these two approaches, what is the pros and cons between using logit link and using identity link?
The main issue is that the linear probability model using the identity link often cannot fit binary response data well. And might not even be successful at all. That is suggested in the Note at the very end of the usage note you refer to. With the identity link, the predicted values can fall outside the valid [0,1] range and fail to converge. This is particularly likely if the data are such that some responses are near 0 or 1. The linear probability model might be okay if all of the responses are in the middle range, such as between 0.25 and 0.75 where the curve of a logistic model is roughly straight. But generally, the logit link is the best way to go to properly deal with binomially distributed data.
Agree with the post from @StatDave . I will add on that one of the reasons people sometimes use an identity link to model binary response data is that you are directly modeling the probability of the event. That is, the model coefficients can be interpreted as estimates of the difference in the probability of the event. Whereas, for the logit link, you are modeling the log odds of the event. The model coefficients are therefore estimates of log odds ratios, and some people find log odds ratios hard to interpret.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.