topic Re: Cluster errors in Proc GLM in Statistical Procedures

Cluster errors in Proc GLM

Bright — Wed, 01 Mar 2023 19:29:29 GMT

Hi,

I am running a regression model with GLM and want to cluster errors at the MSA (Metropolitan Statistical Area) level. Are there any options in GLM that does this?

Thanks!

My code:

proc glm data=data;
class Year Health_Plan Gender MSA;
   model ln_cost= Year Health_Plan Age Gender riskscore MSA / solution;
run;

Re: Cluster errors in Proc GLM

sbxkoenk — Sun, 05 Mar 2023 00:54:50 GMT

Hello,

Do you want to obtain clustered standard errors at the MSA level?

Not possible with PROC GLM , but maybe you can consider "absorption".

https://go.documentation.sas.com/doc/en/statug/15.3/statug_glm_syntax02.htm

Otherwise, turn to PROC MIXED.

( PROC MIXED uses the GLS to estimate the fixed effects. PROC GLM is using the OLS method to fit a fixed effect model )

PROC MIXED adjusts the standard errors for the fixed effects when you have a RANDOM statement in the model. The standard error for the fixed effect is calculated here using both the residual variance and the variance of the random effect, so the standard error is adjusted for the clustering.

proc mixed empirical;
  class MSA;
  model y = x1 x2 x3 / s;
  random int / subject=MSA;
run;

Or use PROC PANEL with the CLUSTER option if you have time-series cross-sectional data.
( I see you have YEAR as an input )

Or use PROC SURVEYREG with the CLUSTER statement if you have survey data.

Koen

Re: Cluster errors in Proc GLM

StatDave — Sun, 05 Mar 2023 03:44:46 GMT

If the data consist of clusters of correlated observations as defined by your MSA variable, then you probably just want a Generalized Estimating Equations (GEE) model which you can fit with PROC GEE. It properly accounts for the correlation within the clusters and provides tests of the effects of the predictors at the population level. For example,

proc gee data=data;
class Year Health_Plan Gender MSA;
model ln_cost= Year Health_Plan Age Gender riskscore;
repeated subject=MSA;
run;

PROC MIXED with a REPEATED statement could be used if you want a subject-specific model that can provide predictions at the observation level.

However, if you data were collected as part of a survey design, then you should use PROC SURVEYREG to get proper standard errors. Neither GEE nor MIXED is designed to analyze survey data.

Re: Cluster errors in Proc GLM

Bright — Mon, 06 Mar 2023 23:17:36 GMT

Hi,

Thanks for the reply. I do not have panel or survey data. It seems proc glm with absorb, proc mixed, and proc genmod work. However, the reported standard errors and hence the significant levels are very different between glm and the other two. How are these processes' standard errors calculated?

Thanks!

Re: Cluster errors in Proc GLM

StatDave — Mon, 06 Mar 2023 23:24:18 GMT

You can still use GEE even if you do not have clusters of observations and it will still provide robust standard errors. You have to specify the REPEATED statement, but you can create a variable that uniquely identifies each observation and then specify that variable in the SUBJECT=option.