Dear All,
I was wondering how I can run a fixed-effect regression with standard errors being clustered. I have a panel data of individuals being observed multiple times. I would like to run the regression with the individual fixed effects and standard errors being clustered by individuals. Since I have more than several thousands of individuals, CLASS statement with PROC SURVEYREG is really inefficient, and SAS says insufficient memory. So I don't think I can use PROC SURVEYREG.
Can I achieve this using proc glm or proc model? I searched, but didn't find a clear way to do so. Thanks in advance.
Maybe PROC GLM with a WEIGHT statement? https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statu...
From the documentation: "If the weights for the observations are proportional to the reciprocals of the error variances, then the weighted least squares estimates are best linear unbiased estimators (BLUE)"
How are you thinking about including cluster in any model you would fit?
@braam wrote:
... and get standard errors clustered by IDs at the same time.
Now this implies that the standard errors clustered by IDs are the output of the regression. Is that correct? I thought the standard errors were inputs to a regression.
Show us the SURVEYREG code you were thinking of using, even if it doesn't work because there's too many individuals.
This is the code that you requested. In this example, having too many values for Origin would make this type of regression really inefficient, which takes more than several hours for my case/data.
The below is GLM code where I cannot cluster standard errors. I also absorb Origin, rather than estimating its fixed effects. I actually expected the same coefficients on Cylinders from these two approaches, but they are not, which is strange to me.
proc surveyreg data= sashelp.cars;
cluster Origin;
class Origin Type;
model EngineSize= Cylinders Origin Type/ solution;
run;
proc glm data= sashelp.cars;
absorb Origin;
class Type;
model EngineSize= Cylinders Type/ solution;
run;
Estimated Regression Coefficients |
||||
---|---|---|---|---|
Parameter | Estimate | Standard Error | t Value | Pr > |t| |
Intercept | -0.2423962 | 0.24823069 | -0.98 | 0.4318 |
Cylinders | 0.6195316 | 0.03299998 | 18.77 | 0.0028 |
Origin Asia | -0.2473363 | 0.02963121 | -8.35 | 0.0141 |
Origin Europe | -0.4510775 | 0.00538821 | -83.72 | 0.0001 |
Origin USA | 0.0000000 | 0.00000000 | . | . |
Type Hybrid | -0.1485498 | 0.10737472 | -1.38 | 0.3007 |
Type SUV | 0.2723754 | 0.09245885 | 2.95 | 0.0985 |
Type Sedan | -0.0206628 | 0.05296500 | -0.39 | 0.7341 |
Type Sports | 0.1480223 | 0.17265540 | 0.86 | 0.4816 |
Type Truck | 0.5319361 | 0.11385004 | 4.67 | 0.0429 |
Type Wagon | 0.0000000 | 0.00000000 | . | . |
GLM RESULT
Parameter | Estimate | Standard Error | t Value | Pr > |t| | |
---|---|---|---|---|---|
Cylinders | 0.6292556337 | 0.01473441 | 42.71 | <.0001 | |
Type Hybrid | -.1535480401 | B | 0.23825545 | -0.64 | 0.5196 |
Type SUV | 0.2436920500 | B | 0.08982120 | 2.71 | 0.0070 |
Type Sedan | -.0144629620 | B | 0.07368536 | -0.20 | 0.8445 |
Type Sports | 0.0949267303 | B | 0.09199753 | 1.03 | 0.3028 |
Type Truck | 0.4970593441 | B | 0.10899489 | 4.56 | <.0001 |
Type Wagon | 0.0000000000 | B | . | . | . |
This seems to be a problem that I will have to think about, as I don't see an obvious path forward right now. Large number of levels of any class variable do cause this problem where you don't have enough memory or it takes a huge long time.
How were you going to handle the issue that SAS always assigns a standard error of zero to one (or more) of the class levels?
To get the same parameter estimates, you need to specify NOINT in the SURVEYREG procedure:
proc sort data=sashelp.cars out=cars;
by Origin;
run;
proc surveyreg data=cars;
cluster Origin;
class Origin Type;
model EngineSize= Cylinders Origin Type/ noint solution;
ods select parameterestimates;
run;
proc glm data=cars;
absorb Origin;
class Type;
model EngineSize= Cylinders Type/ solution;
ods select parameterestimates;
quit;
It is because the variance estimation formulas for survey statistics (like in PROC SURVEYREG) are different from the variance estimation formulas in linear modeling. Although the point estimates are the same, the standard errors are not. The survey variance is inflated because you need to account for the sample design.
If you have panel data ,Try post it at Forecast forum. also try PROC PANEL .
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.