We have used Proc Genmod with the ‘repeated’ statement, to estimate in a GEE model the risk difference between the treatment groups for paired proportion. We have used the LINK=IDENTITY option with Binomial distribution to estimate the differences of population probabilities, as the default logit link produces differences in log odds (logits) rather than differences in population probabilities.
ods output Diffs=Diff LSMeans=LSMean LSMEstimates=LSMEstimate;
proc genmod data=adqs;
class subjid TRTEXPN;
model response (event='1')=TRTEXPN/dist=binomial link=identity;
repeated subject=subjid/type=un;
lsmeans TRTEXPN/diff cl;
lsmestimate TRTEXPN 'Princess (1) Vs Juvéderm (2)' 1 -1/cl upper testvalue=-0.1 alpha=0.025;
run;
We received below question from Regulatory. In order to answer to regulatory could you please share with me the mathematical formula for the variance of the difference in estimate responder rates when using the link=identity? Also could you let us know if there is an alternative approach in SAS to confirm the results from above procedure?
For primary analysis, you used a generalized linear model where the response variable “Responder” was assumed to be binomial. In your genmod model, the link function was identity. It is unclear how the SAS genmod procedure solved the model for the binomial response variable using the identity link. Please provide the mathematical formula for the variance of the difference in estimate responder rates (Ptest-Pcontrol), and provide the mathematical formula for the test statistics of the McNemar type test as well.
Alternate methods for estimating and testing the risk difference in matched pairs data are discussed and illustrated in this note. In addition to the model-based approach using macros with a GEE model without the identity link, the note also discusses a non-model-based approach using the COMMONRISKDIFF option in PROC FREQ.
Similarly for non-clustered data, the methods are discussed in this note. The examples do not use a GEE model but the NLMeans, NLEST, or Margins macros can all be used with a GEE model as shown in the above note. The NLMIXED procedure could also be used by adding a random effect for the clusters, though this is not a GEE model.
Both notes also mentions the potential problem of using the identity link.
As shown in the GENMOD documentation of the LSMEANS statement, the standard error estimate is sqrt(LV(β)L') where V is the estimated variance-covariance matrix of the model parameters and L is the hypothesis matrix which is simply the vector (1 -1) in this case. You could obtain the same result using the ESTIMATE statement. The GENMOD documentation of the ESTIMATE statement shows the same form of the standard error estimate and notes that, for a GEE model, V(β) (written there as Σ) is the empirical estimate of the covariance matrix. The formula for the empirical estimator is shown in the GENMOD documentation in the "Details: Generalized Estimating Equations: Parameter Estimate Covariances" section.
Regarding the concern over how the model is fit using the identity link: the GEE estimation algorithm applies regardless of the distribution and link function. This algorithm is shown in the "Details: Generalized Estimating Equations: Fitting Algorithm" section of the GENMOD documentation. There is no requirement to use the canonical link function (which is the logit link for the binomial distribution). However, it is certainly true that the fitting algorithm could fail when the identity link is used with the binomial distribution because this link function does not assure that the fitting values are valid probabilities as expected for a binomial response. When using the identity link with the binomial distribution, it is therefore important to examine the fit to be sure that proper convergence was obtained. Even if no errors are issued, there could be signs of improper convergence such as gradient values not close to zero (use the ITPRINT option) or large parameter standard errors which should be quite small for binomial models.
Alternate methods for estimating and testing the risk difference in matched pairs data are discussed and illustrated in this note. In addition to the model-based approach using macros with a GEE model without the identity link, the note also discusses a non-model-based approach using the COMMONRISKDIFF option in PROC FREQ.
Similarly for non-clustered data, the methods are discussed in this note. The examples do not use a GEE model but the NLMeans, NLEST, or Margins macros can all be used with a GEE model as shown in the above note. The NLMIXED procedure could also be used by adding a random effect for the clusters, though this is not a GEE model.
Both notes also mentions the potential problem of using the identity link.
Thank you for looking into this, and the solution provided here helped to test the adequacy of our approach used in our study. I have tested my results alternatively using proc freq, %NLMeans, %Margins macros and the results are exactly matching. I have provided the link to this note and the compared results to FDA. Thank you again for your great support. 😊
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.