Programming the statistical procedures from SAS

Modeling Correlated Binary Data

Reply
Regular Contributor
Posts: 180

Modeling Correlated Binary Data

Hello all,

 

I have data which looks like this:

 

  • Subject ID (unique identifier)
  • Group (Treatment or Control)
  • Eye (Left / Right)
  • Outcome (Success / Failure

And the data is coming from a trial testing a new treatment for an eye condition (e.g, Cataract).

 

Every subject contribute 2 observations, from his two eyes, and therefore the data is correlated. I have 20 subjects in each group, thus I have 40 observations in each group.

 

I am trying to model the datat using various methods, and to learn from it on the differences between GLMM and GEE in general and in SAS in particular.

 

I tried the following:

 

 

proc glimmix data = Example1;
class Group ID;
model outcome(event='1') = Group / dist = binary solution;
random int / sub = ID;
lsmeans Group / ilink;
run;

proc glimmix data = Example1 ;
class ID Group Eye;
model outcome(event='1') = Group / dist=binary solution;
random Eye / residual type=unr subject=ID ;
lsmeans Group / ilink;
run;

proc genmod data=Example1 descending;
class Group ID;
model outcome = Group / dist=bin;
repeated subject=ID / type=cs covb corrw;
run;

 

Each model gave somehow different results (this is a simulation exercise, so I do know the REAL proportions (of success and failure, and also the correlation within subject).

 

I have multiple questions:

 

  1. The basic one - is my code correct, did I actually use GLMM and GEE properly here ?
  2. I understand that I can use PROC GLIMMIX to model GEE, how do I do that ? Will I get similar results to GENMOD ?
  3. What is the difference between GEE and an "R Side" GLMM ? More specifically, what is the difference between my second model and GEE ?
  4. Which method should I use when modeling a G side GLMM ? I got different results when going from quad to laplace, and the default (none of them) was the closest to the real values. Is there a way to compare all sorts of models (different methods, GEE,...)? I understand that AIC and BIC are not always a good source.

If you can add any info and details which will help me understand what and how to use, and when, it will be very appreciated. I do know the basic difference between GEE and GLMM (i.e. marginal vs. conditional).

 

Thank you in advance !

SAS Employee
Posts: 51

Re: Modeling Correlated Binary Data

Hello,

 

I don't know much about GEE models, but you should know there is a third procedure that you can use for this (PROC GEE, the GEE Procedure).

The GEE procedure was introduced in SAS/STAT 13.2.

 

Take care if you have a SAS/STAT release prior to 13.2 because then your GEE procedure is experimental!

SAS/STAT(R) 14.1 User's Guide
The GEE Procedure
http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_gee_toc.htm

 

See also this paper:

Paper SAS166-2015

Weighted Methods for Analyzing Missing Data with the GEE Procedure

Guixian Lin and Robert N. Rodriguez, SAS Institute Inc.

https://support.sas.com/resources/papers/proceedings14/SAS166-2014.pdf

 

Cheers,

Koen

 

 

Regular Contributor
Posts: 180

Re: Modeling Correlated Binary Data

Thank you for your quick reply. Unfortunatelly I do not have access to PROC GEE (yet), so I still reqire an answer relating to the procedures I wrote. Saying that, I am looking forward to try the new procedure in the future, I think that GEE deserves a procedure.

Valued Guide
Valued Guide
Posts: 684

Re: Modeling Correlated Binary Data

A lot of your questions would require long answers to properly explain. I highly recommend you get the great textbook by Walt Stroup (Generalized Linear Mixed Models). You would learn a great deal and learn how to address all your questions.

Regular Contributor
Posts: 180

Re: Modeling Correlated Binary Data

Hi lvm,

 

I do have the book actually, not an easy one, but a very good one. If I may, I will try to focus to a couple of smaller questions, maybe you can guide me a little bit.

 

In the clinical trials frame, is there any logic telling us when should we use each model ? For example, is it correct that in prospective randomized trials, the results of both models should be roughly the same ? Is it like this in all cases ? And my second question, if I have a model with a binary outcome, one factring variable (treatment vs. control) and another factor covariate (with 4 levels), and I remove one level from the covariate, how will it affect each model ?

 

thank you !

Valued Guide
Valued Guide
Posts: 684

Re: Modeling Correlated Binary Data

Two other references that may be easier for you:

book by Ed Gbur and co-authos (Analysis of Generalized Linear Mixed Models)

article in the publication Agronomy Journal by Walter Stroup (published in 2014, I believe) on GLMMs.

These are for the agricultural sciences, but should be much clearer for you.

 

The conditional model (one without a scale parameter, where overdispersion is handled by adding a random effect for the lowest level in the hierarchy) is properly targeting the probability of a trait for the subject (lowest level in the hierarchy) as a function of the predictor variables. The GEE model (or, in general, models that handle overdispersion by rescaling all the SE) is targeting the marginal distribution -- mean for the observations over all the subjects. The variance among subjects, in addition to the probability parameter, determine the mean proportion. Only for normal distributions are these two the same thing. With binary or binomial data, when the probability is less than 1/2, the marginal mean proportion for the observations is larger than the conditional probability. Stroup makes the argument that researchers usually want the probability for the conditional model, although there are exceptions. Since the target of the inference is not the same, one does not get the same result for the two approaches. They can be similar, but they are not expected to be overly similar. It depends on the variances/covariances. There is a great section in chapter 3 of the Stroup book about this: conditional vs marginal inference. Perhaps the most important conceptual part of the book.

 

Removing a factor will certainly change the model fit and other parameters.

 

 

Ask a Question
Discussion stats
  • 5 replies
  • 382 views
  • 1 like
  • 3 in conversation