BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Bright
Obsidian | Level 7

Hi,

I am running a regression model with GLM and want to cluster errors at the MSA (Metropolitan Statistical Area) level. Are there any options in GLM that does this?

Thanks!

My code:

proc glm data=data;
class Year Health_Plan Gender MSA;
   model ln_cost= Year Health_Plan Age Gender riskscore MSA / solution;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

Do you want to obtain clustered standard errors at the MSA level?

 

Not possible with PROC GLM , but maybe you can consider "absorption".

https://go.documentation.sas.com/doc/en/statug/15.3/statug_glm_syntax02.htm

 

Otherwise, turn to PROC MIXED.

PROC MIXED uses the GLS to estimate the fixed effects. PROC GLM is using the OLS method to fit a fixed effect model )

PROC MIXED adjusts the standard errors for the fixed effects when you have a RANDOM statement in the model. The standard error for the fixed effect is calculated here using both the residual variance and the variance of the random effect, so the standard error is adjusted for the clustering.

proc mixed empirical;
  class MSA;
  model y = x1 x2 x3 / s;
  random int / subject=MSA;
run;

 

Or use PROC PANEL with the CLUSTER option if you have time-series cross-sectional data.
( I see you have YEAR as an input )

 

Or use PROC SURVEYREG with the CLUSTER statement if you have survey data.

 

Koen

View solution in original post

4 REPLIES 4
sbxkoenk
SAS Super FREQ

Hello,

 

Do you want to obtain clustered standard errors at the MSA level?

 

Not possible with PROC GLM , but maybe you can consider "absorption".

https://go.documentation.sas.com/doc/en/statug/15.3/statug_glm_syntax02.htm

 

Otherwise, turn to PROC MIXED.

PROC MIXED uses the GLS to estimate the fixed effects. PROC GLM is using the OLS method to fit a fixed effect model )

PROC MIXED adjusts the standard errors for the fixed effects when you have a RANDOM statement in the model. The standard error for the fixed effect is calculated here using both the residual variance and the variance of the random effect, so the standard error is adjusted for the clustering.

proc mixed empirical;
  class MSA;
  model y = x1 x2 x3 / s;
  random int / subject=MSA;
run;

 

Or use PROC PANEL with the CLUSTER option if you have time-series cross-sectional data.
( I see you have YEAR as an input )

 

Or use PROC SURVEYREG with the CLUSTER statement if you have survey data.

 

Koen

StatDave
SAS Super FREQ

If the data consist of clusters of correlated observations as defined by your MSA variable, then you probably just want a Generalized Estimating Equations (GEE) model which you can fit with PROC GEE. It properly accounts for the correlation within the clusters and provides tests of the effects of the predictors at the population level. For example, 

proc gee data=data;
class Year Health_Plan Gender MSA;
model ln_cost= Year Health_Plan Age Gender riskscore;
repeated subject=MSA;
run;

PROC MIXED with a REPEATED statement could be used if you want a subject-specific model that can provide predictions at the observation level.

 

However, if you data were collected as part of a survey design, then you should use PROC SURVEYREG to get proper standard errors. Neither GEE nor MIXED is designed to analyze survey data.

Bright
Obsidian | Level 7

Hi,

Thanks for the reply. I do not have panel or survey data. It seems proc glm with absorb, proc mixed, and proc genmod work. However, the reported standard errors and hence the significant levels are very different between glm and the other two. How are these processes' standard errors calculated?

Thanks!

StatDave
SAS Super FREQ
You can still use GEE even if you do not have clusters of observations and it will still provide robust standard errors. You have to specify the REPEATED statement, but you can create a variable that uniquely identifies each observation and then specify that variable in the SUBJECT=option.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2041 views
  • 7 likes
  • 3 in conversation