Dear all,
the question is: what's the sense SAS used and the theory SAS used when I'm using Proc genmode with Binomial distribution and link=identity instead of logit?
For the computation of the difference and CI, I'm using a model that fits and adjusts for the correlation within pairs, so I'm using PROC GENMOD.
The dataset entered in this procedure is as follow described:
-there are records for each Preferred Term per each patient
-there are two treatments and a patient can have done both; the patients can be count twice because in one Preferred Term the same patient can have it once with one treatment and with the other (so it is possible there is correlation).
-the response variable is Y and N
The program used is the following:
proc genmod data = dataset_CI descend;
by aedecod;
class usubjid treatn (ref="2") ;
model result=treatn/ dist=bin link=id ;
repeated subject=usubjid / corr=unstr ;
lsmeans treatn/ diff cl ;
run;
Descend is used to analyze result=Y; by aedecod is because I need CI for each aedecod; in class statement there is usubjid because I used it in repeated statement and treatn is the treatment numeric variable that can have 2 different values (1 and 2), result is the response variable (Y/N), correlation is unstructured because there is no independence.
My question is in model statement:
I obviously used the distribution Binomial, the default link is Logit rightly, BUT in my case the only one link that seems correct is the link=identity.
What's the sense SAS used and the statistic theory SAS used when I'm using Proc genmod with Binomial distribution and link=identity instead of logit?
Thank you very much
I'm not sure what you mean by "in my case the only one link that seems correct is the link=identity"... as you say, the default link=logit is the typical (and canonical) link most used with the binomial distribution. If you have seen someone use the identity link with the binomial distribution and are wondering why, it is probably because they think they need to do that to estimate differences of population probabilities. With the default logit link, using the DIFF option in the LSMEANS statement produces differences in log odds (logits) rather than differences in probabilities. So, some might use the identity link as a way to fix that. But, as you probably know, the identity link does not ensure that the predicted values will by valid probabilities. That is what the logit link ensures. However, you can easily fit a logistic model, using the logit link, and still estimate differences on the probability scale as shown in this note. Additionally, some want to estimate the so-called "difference in difference" on the probability scale and that is illustrated in this note.
I think
model result=treatn/ dist=bin link=id ;
is same as
proc reg;
model result=treatn/;
if result have value 0 and 1.
but proc reg is not fit data as much as proc logistic .
I'm not sure what you mean by "in my case the only one link that seems correct is the link=identity"... as you say, the default link=logit is the typical (and canonical) link most used with the binomial distribution. If you have seen someone use the identity link with the binomial distribution and are wondering why, it is probably because they think they need to do that to estimate differences of population probabilities. With the default logit link, using the DIFF option in the LSMEANS statement produces differences in log odds (logits) rather than differences in probabilities. So, some might use the identity link as a way to fix that. But, as you probably know, the identity link does not ensure that the predicted values will by valid probabilities. That is what the logit link ensures. However, you can easily fit a logistic model, using the logit link, and still estimate differences on the probability scale as shown in this note. Additionally, some want to estimate the so-called "difference in difference" on the probability scale and that is illustrated in this note.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.