Hi,
I am running a regression model with GLM and want to cluster errors at the MSA (Metropolitan Statistical Area) level. Are there any options in GLM that does this?
Thanks!
My code:
proc glm data=data;
class Year Health_Plan Gender MSA;
model ln_cost= Year Health_Plan Age Gender riskscore MSA / solution;
run;
Hello,
Do you want to obtain clustered standard errors at the MSA level?
Not possible with PROC GLM , but maybe you can consider "absorption".
https://go.documentation.sas.com/doc/en/statug/15.3/statug_glm_syntax02.htm
Otherwise, turn to PROC MIXED.
( PROC MIXED uses the GLS to estimate the fixed effects. PROC GLM is using the OLS method to fit a fixed effect model )
PROC MIXED adjusts the standard errors for the fixed effects when you have a RANDOM statement in the model. The standard error for the fixed effect is calculated here using both the residual variance and the variance of the random effect, so the standard error is adjusted for the clustering.
proc mixed empirical;
class MSA;
model y = x1 x2 x3 / s;
random int / subject=MSA;
run;
Or use PROC PANEL with the CLUSTER option if you have time-series cross-sectional data.
( I see you have YEAR as an input )
Or use PROC SURVEYREG with the CLUSTER statement if you have survey data.
Koen
Hello,
Do you want to obtain clustered standard errors at the MSA level?
Not possible with PROC GLM , but maybe you can consider "absorption".
https://go.documentation.sas.com/doc/en/statug/15.3/statug_glm_syntax02.htm
Otherwise, turn to PROC MIXED.
( PROC MIXED uses the GLS to estimate the fixed effects. PROC GLM is using the OLS method to fit a fixed effect model )
PROC MIXED adjusts the standard errors for the fixed effects when you have a RANDOM statement in the model. The standard error for the fixed effect is calculated here using both the residual variance and the variance of the random effect, so the standard error is adjusted for the clustering.
proc mixed empirical;
class MSA;
model y = x1 x2 x3 / s;
random int / subject=MSA;
run;
Or use PROC PANEL with the CLUSTER option if you have time-series cross-sectional data.
( I see you have YEAR as an input )
Or use PROC SURVEYREG with the CLUSTER statement if you have survey data.
Koen
If the data consist of clusters of correlated observations as defined by your MSA variable, then you probably just want a Generalized Estimating Equations (GEE) model which you can fit with PROC GEE. It properly accounts for the correlation within the clusters and provides tests of the effects of the predictors at the population level. For example,
proc gee data=data;
class Year Health_Plan Gender MSA;
model ln_cost= Year Health_Plan Age Gender riskscore;
repeated subject=MSA;
run;
PROC MIXED with a REPEATED statement could be used if you want a subject-specific model that can provide predictions at the observation level.
However, if you data were collected as part of a survey design, then you should use PROC SURVEYREG to get proper standard errors. Neither GEE nor MIXED is designed to analyze survey data.
Hi,
Thanks for the reply. I do not have panel or survey data. It seems proc glm with absorb, proc mixed, and proc genmod work. However, the reported standard errors and hence the significant levels are very different between glm and the other two. How are these processes' standard errors calculated?
Thanks!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.