I have a question about robust variance estimation with multilevel models.
Some context - I am using survey data - there are 50 strata [varname: region]; household clusters were randomly selected within strata [varname: house] and all household members were surveyed. Region is an interesting variable that I would like to model as a random effect. Households are a nuisance variable. My outcome is binary [varname: outcome]. For simplicity sake, I have one independent variable, [varname: var1]. Weighting is not important.
Here's what I have so far:
proc glimmix;
class region house var1;
model outcome=var1/dist=binary link=logit;
random intercept /id=region;
run;
I cannot figure out how to model the clustering within households. I understand that I can include the 'empirical' command to the proc statement to generate robust sandwich estimators - is this what I want? Or am I off base?
Thanks!
If house is nested in region (seems a logical assumption to me), then the following might work:
proc glimmix method=laplace empirical;
class region house var1;
model outcome=var1/dist=binary link=logit;
random intercept region/id=house;
run;
This will result in outcome being conditional on the random effects. Be sure to sort the data by region and house.
If this runs into convergence problems, you may want to convert from binary to binomial, by going to the events/trial syntax per house.
Steve Denham
Thanks - I assume you mean subject=house instead of id=house? [it doesn't run otherwise] Replacing id with subject, the code works,
proc glimmix data=events oddsratio method=laplace empirical;
class region house var1 ;
model hcf4/n=var1 /dist=binomial link=logit;
random intercept region /subject=house;
run;
BUT.....
I get an error message - even using the events/trials syntax.
ERROR: Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time on this system. Consider changing your model.
This is a pretty big data set with > 100,000 households. Other suggestions?
Thanks!
Yeah, good catch on the subject= option.
OK, with >100,000 households, I suspect you are going to have to collapse the data (or work on a larger system). However, if house is a numeric variable, you might be able to get this to work if you remove house from the CLASS statement. I know the RANDOM statement and the subject= option will take continuous variables.
Steve Denham
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.