Re: Variance estimation in multilevel models [glimmix]

DocHP · Posted 07-15-2014 11:08 AM

I have a question about robust variance estimation with multilevel models.

Some context - I am using survey data - there are 50 strata [varname: region]; household clusters were randomly selected within strata [varname: house] and all household members were surveyed. Region is an interesting variable that I would like to model as a random effect. Households are a nuisance variable. My outcome is binary [varname: outcome]. For simplicity sake, I have one independent variable, [varname: var1]. Weighting is not important.

Here's what I have so far:

proc glimmix;

class region house var1;

model outcome=var1/dist=binary link=logit;

random intercept /id=region;

run;

I cannot figure out how to model the clustering within households. I understand that I can include the 'empirical' command to the proc statement to generate robust sandwich estimators - is this what I want? Or am I off base?

Thanks!

SteveDenham · Posted 07-16-2014 08:00 AM

If house is nested in region (seems a logical assumption to me), then the following might work:

proc glimmix method=laplace empirical;

class region house var1;

model outcome=var1/dist=binary link=logit;

random intercept region/id=house;

run;

This will result in outcome being conditional on the random effects. Be sure to sort the data by region and house.

If this runs into convergence problems, you may want to convert from binary to binomial, by going to the events/trial syntax per house.

Steve Denham

DocHP · Posted 07-16-2014 07:09 PM

Thanks - I assume you mean subject=house instead of id=house? [it doesn't run otherwise] Replacing id with subject, the code works,

proc glimmix data=events oddsratio method=laplace empirical;

class region house var1 ;

model hcf4/n=var1 /dist=binomial link=logit;

random intercept region /subject=house;

run;

BUT.....

I get an error message - even using the events/trials syntax.

ERROR: Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time on this system. Consider changing your model.

This is a pretty big data set with > 100,000 households. Other suggestions?

Thanks!

SteveDenham · Posted 07-17-2014 07:55 AM

Yeah, good catch on the subject= option.

OK, with >100,000 households, I suspect you are going to have to collapse the data (or work on a larger system). However, if house is a numeric variable, you might be able to get this to work if you remove house from the CLASS statement. I know the RANDOM statement and the subject= option will take continuous variables.

Steve Denham

Variance estimation in multilevel models [glimmix]