BookmarkSubscribeRSS Feed
Bota2024
Calcite | Level 5

Hello,

 

According to https://support.sas.com/kb/46/997.html, PROC GENMOD can be used to fit a repeated measures logistic model to the individual level data.

 

In the example, 100 husband and wife pairs were asked a question that could be answered Yes or No.  We are to estimate the probability difference between husband and wife.

 

member = 1 if husband, 0 if wife

id: the husband and wife identifier

 

The model can be fitted via logit link or identity link. If using logit link, the probability difference can get from  %NLMeans(instore=geemod, coef=c, link=logit, title=Pr(Yes) Difference). 

 

logit link:

proc genmod data=indiv;
       class id member;
       model response(event='1') = member / dist=binomial;
       repeated subject=id;
       lsmeans member / ilink cl e;
       ods output coef=c;
       store geemod;
       run;
 %NLMeans(instore=geemod, coef=c, link=logit, title=Pr(Yes) Difference)

Identity link:

     proc genmod data=indiv;
       class id member;
       model response(event='1') = member / dist=binomial link=identity;
       repeated subject=id;
       lsmeans member / diff cl;
       run;

 

Comparing these two approaches, what is the pros and cons between using logit link and using identity link?

2 REPLIES 2
StatDave
SAS Super FREQ

The main issue is that the linear probability model using the identity link often cannot fit binary response data well. And might not even be successful at all. That is suggested in the Note at the very end of the usage note you refer to. With the identity link, the predicted values can fall outside the valid [0,1] range and fail to converge. This is particularly likely if the data are such that some responses are near 0 or 1. The linear probability model might be okay if all of the responses are in the middle range, such as between 0.25 and 0.75 where the curve of a logistic model is roughly straight. But generally, the logit link is the best way to go to properly deal with binomially distributed data. 

Mike_N
SAS Employee

Agree with the post from @StatDave . I will add on that one of the reasons people sometimes use an identity link to model binary response data is that you are directly modeling the probability of the event. That is, the model coefficients can be interpreted as estimates of the difference in the probability of the event. Whereas, for the logit link, you are modeling the log odds of the event. The model coefficients are therefore estimates of log odds ratios, and some people find log odds ratios hard to interpret. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 267 views
  • 1 like
  • 3 in conversation