Re: Fixed effect with clustered standard errors? proc glm?

braam · Posted 10-18-2019 10:35 AM

Dear All,

I was wondering how I can run a fixed-effect regression with standard errors being clustered. I have a panel data of individuals being observed multiple times. I would like to run the regression with the individual fixed effects and standard errors being clustered by individuals. Since I have more than several thousands of individuals, CLASS statement with PROC SURVEYREG is really inefficient, and SAS says insufficient memory. So I don't think I can use PROC SURVEYREG.

Can I achieve this using proc glm or proc model? I searched, but didn't find a clear way to do so. Thanks in advance.

PaigeMiller · Posted 10-18-2019 10:40 AM

Maybe PROC GLM with a WEIGHT statement? https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statu...

From the documentation: "If the weights for the observations are proportional to the reciprocals of the error variances, then the weighted least squares estimates are best linear unbiased estimators (BLUE)"

--
Paige Miller

braam · Posted 10-18-2019 11:10 AM

Isn't WLS about heteroscedasticity (i.e., variance) while clustering standard errors is about covariance within a unit (having multiple observations)? I think they are two different issues.

PaigeMiller · Posted 10-18-2019 11:38 AM

How are you thinking about including cluster in any model you would fit?

--
Paige Miller

braam · Posted 10-18-2019 02:04 PM

I'm not sure if I understand your suggestion.

What I would like to do is to include IDs as fixed effects and get standard errors clustered by IDs at the same time. I know it's possible with PROC SURVEYREG, but when I have many ID values, it's practically impossible. So I'm looking for another procedure.

PaigeMiller · Posted 10-18-2019 02:11 PM

@braam wrote:
... and get standard errors clustered by IDs at the same time.

Now this implies that the standard errors clustered by IDs are the output of the regression. Is that correct? I thought the standard errors were inputs to a regression.

--
Paige Miller

braam · Posted 10-18-2019 02:29 PM

Sorry for the confusion. Yes, I would like to 1) have clustered standard errors and 2) include individual-fixed effects.

PaigeMiller · Posted 10-18-2019 02:31 PM

Show us the SURVEYREG code you were thinking of using, even if it doesn't work because there's too many individuals.

--
Paige Miller

braam · Posted 10-18-2019 03:11 PM

This is the code that you requested. In this example, having too many values for Origin would make this type of regression really inefficient, which takes more than several hours for my case/data.

The below is GLM code where I cannot cluster standard errors. I also absorb Origin, rather than estimating its fixed effects. I actually expected the same coefficients on Cylinders from these two approaches, but they are not, which is strange to me.

proc surveyreg data= sashelp.cars;
	cluster Origin;
	class Origin Type;
	model EngineSize= Cylinders Origin Type/ solution;
	run;

proc glm data= sashelp.cars;
	absorb Origin;
	class Type;
	model EngineSize= Cylinders Type/ solution;
	run;

SURVEYREG RESULT

Estimated Regression Coefficients
Parameter	Estimate	Standard Error	t Value	Pr > \|t\|
Intercept	-0.2423962	0.24823069	-0.98	0.4318
Cylinders	0.6195316	0.03299998	18.77	0.0028
Origin Asia	-0.2473363	0.02963121	-8.35	0.0141
Origin Europe	-0.4510775	0.00538821	-83.72	0.0001
Origin USA	0.0000000	0.00000000	.	.
Type Hybrid	-0.1485498	0.10737472	-1.38	0.3007
Type SUV	0.2723754	0.09245885	2.95	0.0985
Type Sedan	-0.0206628	0.05296500	-0.39	0.7341
Type Sports	0.1480223	0.17265540	0.86	0.4816
Type Truck	0.5319361	0.11385004	4.67	0.0429
Type Wagon	0.0000000	0.00000000	.	.

GLM RESULT

Parameter	Estimate		Standard Error	t Value	Pr > \|t\|
Cylinders	0.6292556337		0.01473441	42.71	<.0001
Type Hybrid	-.1535480401	B	0.23825545	-0.64	0.5196
Type SUV	0.2436920500	B	0.08982120	2.71	0.0070
Type Sedan	-.0144629620	B	0.07368536	-0.20	0.8445
Type Sports	0.0949267303	B	0.09199753	1.03	0.3028
Type Truck	0.4970593441	B	0.10899489	4.56	<.0001
Type Wagon	0.0000000000	B	.	.	.

PaigeMiller · Posted 10-18-2019 03:24 PM

This seems to be a problem that I will have to think about, as I don't see an obvious path forward right now. Large number of levels of any class variable do cause this problem where you don't have enough memory or it takes a huge long time.

How were you going to handle the issue that SAS always assigns a standard error of zero to one (or more) of the class levels?

--
Paige Miller

Rick_SAS · Posted 10-21-2019 11:27 AM

To get the same parameter estimates, you need to specify NOINT in the SURVEYREG procedure:

proc sort data=sashelp.cars out=cars;
by Origin;
run;

proc surveyreg data=cars;
	cluster Origin;
	class Origin Type;
	model EngineSize= Cylinders Origin Type/ noint solution;
        ods select parameterestimates;
	run;

proc glm data=cars;
	absorb Origin;
	class Type;
	model EngineSize= Cylinders Type/ solution;
        ods select parameterestimates;
	quit;

braam · Posted 10-21-2019 12:50 PM

Thanks! I confirmed it! One thing that is interesting to me is that the coefficient on Cylinders is 0.619 in both ways, but their t-stat varies a lot. For surveyreg, t-stat is 18.77 while for glm, t-stat is 46.32.

Is it because absorbing fixed-effects (conceptually demeaning) influences variance-covariance matrix?

Rick_SAS · Posted 10-21-2019 01:45 PM

It is because the variance estimation formulas for survey statistics (like in PROC SURVEYREG) are different from the variance estimation formulas in linear modeling. Although the point estimates are the same, the standard errors are not. The survey variance is inflated because you need to account for the sample design.

Ksharp · Posted 10-19-2019 08:09 AM

If you have panel data ,Try post it at Forecast forum. also try PROC PANEL .

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away