New Contributor
Posts: 3

# How do I adjust for clustered data in logistic regression?

I am using proc logistic to investigate the association between the variables laek and pv (indexar, alder, arv, and koen are confounders). The model looks something like this:

proc logistic data=dataset;
class indexar (ref='2010') koen (ref='K') / param = ref ;
model laek(EVENT='1') = pv indexar alder  arv koen;
run;

The individual level dataset that I am using is obtained from patients who are grouped within different clinics. I believe that the clinic effect is a mediator rather than a confunder. How can I adjust for clustering? Should I use a "sandwich" estimator and in that case, how do I do that in SAS?

Super User
Posts: 10,200

## Re: How do I adjust for clustered data in logistic regression?

Try Generalize Linear Mixed Model.

Check PROC GLIMMIX .

Frequent Contributor
Posts: 98

## Re: How do I adjust for clustered data in logistic regression?

Depending on the exact type of inference you are interested in, you can account for such clustering in a number of ways. The two simplest ways are probably in GENMOD or GLIMMIX (though, depending on the details of the analysis you can also use PROC SURVEYLOGISTIC or even PROC PHREG, or reparameterize your data to use a conditional maximum likelihood approach in PROC LOGISTIC. But these are less intuitive and more complicated approaches, I think).

For example, you can fit a generalized estimating equation (GEE):

PROC GENMOD data=dataset;
class indexar (ref='2010') koen (ref='K') / param = ref ;
model laek(EVENT='1') = pv indexar alder  arv koen;

repeated subject=patient_id/ type=un;
run;

You can fit just fit a normal logistic model but with the empirical ("sandwich") variance estimators:

PROC GLIMMIX data=dataset empirical;

class indexvar koen;

model laek(event='1') = pv indexar alder arv koen;

run;

You can fit a mixed effects model:

PROC GLIMMIX data=dataset;

class indexvar koen;

model laek(event='1') = pv indexar alder arv koen;

random intercept / subject=patient_id;

run;

For both the RANDOM statement in GLIMMIX and the REPEATED statement in GENMOD, you can also specify a hierarchical (nested) structure (e.g. "random intercept / subject=patient_id(clinic_id)" for patients nested within clinics), though I would consult this note (http://support.sas.com/kb/24/200.html) and/or this document (https://support.sas.com/resources/papers/proceedings14/SAS026-2014.pdf) to make sure you understand exactly what it is you are specifying and how to interpret it.

New Contributor
Posts: 3