Modelling a binary data using PROC GENMOD via logit link or linear lin...

Bota2024 · Posted 03-12-2024 05:17 PM

Hello,

According to https://support.sas.com/kb/46/997.html, PROC GENMOD can be used to fit a repeated measures logistic model to the individual level data.

In the example, 100 husband and wife pairs were asked a question that could be answered Yes or No. We are to estimate the probability difference between husband and wife.

member = 1 if husband, 0 if wife

id: the husband and wife identifier

The model can be fitted via logit link or identity link. If using logit link, the probability difference can get from %NLMeans(instore=geemod, coef=c, link=logit, title=Pr(Yes) Difference).

logit link:

proc genmod data=indiv;
       class id member;
       model response(event='1') = member / dist=binomial;
       repeated subject=id;
       lsmeans member / ilink cl e;
       ods output coef=c;
       store geemod;
       run;

 %NLMeans(instore=geemod, coef=c, link=logit, title=Pr(Yes) Difference)

Identity link:

     proc genmod data=indiv;
       class id member;
       model response(event='1') = member / dist=binomial link=identity;
       repeated subject=id;
       lsmeans member / diff cl;
       run;

Comparing these two approaches, what is the pros and cons between using logit link and using identity link?

StatDave · Posted 03-12-2024 05:27 PM

The main issue is that the linear probability model using the identity link often cannot fit binary response data well. And might not even be successful at all. That is suggested in the Note at the very end of the usage note you refer to. With the identity link, the predicted values can fall outside the valid [0,1] range and fail to converge. This is particularly likely if the data are such that some responses are near 0 or 1. The linear probability model might be okay if all of the responses are in the middle range, such as between 0.25 and 0.75 where the curve of a logistic model is roughly straight. But generally, the logit link is the best way to go to properly deal with binomially distributed data.

Mike_N · Posted 03-13-2024 12:00 PM

Agree with the post from @StatDave . I will add on that one of the reasons people sometimes use an identity link to model binary response data is that you are directly modeling the probability of the event. That is, the model coefficients can be interpreted as estimates of the difference in the probability of the event. Whereas, for the logit link, you are modeling the log odds of the event. The model coefficients are therefore estimates of log odds ratios, and some people find log odds ratios hard to interpret.

Modelling a binary data using PROC GENMOD via logit link or linear link

Re: Modelling a binary data using PROC GENMOD via logit link or linear link

Re: Modelling a binary data using PROC GENMOD via logit link or linear link

Modelling a binary data using PROC GENMOD via logit link or linear link

Re: Modelling a binary data using PROC GENMOD via logit link or linear link

Re: Modelling a binary data using PROC GENMOD via logit link or linear link

SAS Innovate 2025: Call for Content