Dear all,
I need your help regarding finding an adequate proc command to analyze a panel dataset with several thousand firm-year observations.
My dataset, of which I also attached a small fragment of 100 observations, includes the variables firm_id, year, Industry_id, the dependent variable Y and 3 independent variables X1-X3.
My goal is to run an Industry and Year fixed effects regression with standard error clustering on the firm-level.
I have seen that several options are possible, but I wonder if I understood them correctly, how they differentiate and which would be appropriate for me to use:
Option 1: Proc surveyreg
proc surveyreg data=testset;
cluster firm_id;
class Industry_id Year;
model Y = X1 X2 X3 / solution;
run;
quit;
The problem, that I see here is that proc surveyreg is mainly for analyzing survey data and not regular panel data or am I wrong and this should not be of any concern? Furthermore, is it correct, that the the cluster statement is responsible for the firm-level clustering and the class statement for the fixed effects or what exactly is the class statement doing in this case ?
Option 2: Proc GLM
proc glm data=testset;
class industry_id year;
model Y = X1 X2 X3 /solution;
run;
quit;
Unfortunately, I don't see how to cluster for the firms with this option, or is there any statement? Also is the class statement correct in this case to have a Year and industry FE regression? Otherwise, would the absorb statement be the correct way to account for Fixed Effects?
Option 3: Proc panel
proc sort data=testset;
by firm_id year;
run;
proc panel data=testset;
id firm_id year;
model Y = X1 X2 X3 /Fixtwo;
Run;
After sorting the dataset, I tried using the Proc panel method, but here I don't see where to include the Industry_id variable and thus account for the Industry FE. Also, is the assumption correct that the Fixtwo statement, corrects for 1) the firm FE (due to firm_id) and 2) the Year FE? How could I instead include the Industry_id as a FE and cluster the Std errors on firm level?
A fourth option would maybe be the Proc tscsreg:
proc tscsreg data=testset;
id firm_id year;
model Y = X1 X2 X3 /fixtwo;
run;
quit;
But again, here I don't cluster the standard errors and also I would again account for the Firm and Year FE instead of an Industry FE.
So in general I have the problem that I don't find a way to combine the clustering with the FE approach and also I have a problem with including the Industry_id variable, to have a Industry FE regression.
Has somebody already run this kind of regression and/or could please help me with this problem?
Hello,
If you have firm-year panel data and you want to specify fixed effects on industry and year, and cluster adjust standard errors at firm level, the appropriate procedure to use is PROC PANEL. The SURVEYREG procedure is designed for survey data, not panel data. PROC GLM does not provide functionality to obtain cluster adjustment on standard errors, and neither does PROC TSCSREG.
In PROC PANEL, you can use CLUSTER option together with HCCME = option in the MODEL statement to request heteroscedasticity and cluster adjusted standard errors on the cross section dimension(firm in your case). The following usage note provides more details on how to use the CLUSTER option in several different scenarios with example syntax:
https://support.sas.com/kb/67/322.html
To specify industry fixed effects, you can specify the industry_id variable in the CLASS statement and the MODEL statement, and specify POOLED option to request pooled OLS regression. To specify year fixed effects, you can either specify year variable together with industry_id variable in the CLASS statement and MODEL statement, and specify POOLED option, or you can also specify FIXONETIME option instead of POOLED option without having to specify the year variable in the CLASS and MODEL statements, since the FIXONETIME option automatically includes time (year) fixed effects in the model. However, as discussed in the above usage note, please be aware that there is some difference in the cluster adjustment regarding the time(year) fixed effects estimates using these two approaches: If you specify the year variable in the CLASS and MODEL statement and specify POOLED option, the cluster adjustment applies to the year fixed effects estimates as well. If you specify FIXONETIME option instead of POOLED option, then the hccme and cluster adjustment does not apply to the year fixed effects.
Following is an example using these two approaches in PROC PANEL to specify fixed effects on industry and year and obtain cluster adjustment on firm level. Note that HCCME = 1 in the code is only an example. You can choose HCCME = 0, 1, 2, 3 to be used together with the CLUSTER option as discussed in the above usage note.
/*method 1: specify industry_id and year in CLASS and MODEL statements, and specify
POOLED option */
ods select parameterestimates;
proc panel data=testset;
id firm_id year;
class Industry_id year ;
model Y = X1 X2 X3 Industry_id year/pooled hccme=1 cluster ;
run;
/*method 2: specify industry_id only in the CLASS and MODEL statements, specify FIXONETIME option in MODEL statement*/
ods select parameterestimates;
proc panel data=testset;
id firm_id year;
class Industry_id ;
model Y = X1 X2 X3 Industry_id /fixonetime printfixed hccme=1 cluster;
run;
I hope this helps.
Hello,
If you have firm-year panel data and you want to specify fixed effects on industry and year, and cluster adjust standard errors at firm level, the appropriate procedure to use is PROC PANEL. The SURVEYREG procedure is designed for survey data, not panel data. PROC GLM does not provide functionality to obtain cluster adjustment on standard errors, and neither does PROC TSCSREG.
In PROC PANEL, you can use CLUSTER option together with HCCME = option in the MODEL statement to request heteroscedasticity and cluster adjusted standard errors on the cross section dimension(firm in your case). The following usage note provides more details on how to use the CLUSTER option in several different scenarios with example syntax:
https://support.sas.com/kb/67/322.html
To specify industry fixed effects, you can specify the industry_id variable in the CLASS statement and the MODEL statement, and specify POOLED option to request pooled OLS regression. To specify year fixed effects, you can either specify year variable together with industry_id variable in the CLASS statement and MODEL statement, and specify POOLED option, or you can also specify FIXONETIME option instead of POOLED option without having to specify the year variable in the CLASS and MODEL statements, since the FIXONETIME option automatically includes time (year) fixed effects in the model. However, as discussed in the above usage note, please be aware that there is some difference in the cluster adjustment regarding the time(year) fixed effects estimates using these two approaches: If you specify the year variable in the CLASS and MODEL statement and specify POOLED option, the cluster adjustment applies to the year fixed effects estimates as well. If you specify FIXONETIME option instead of POOLED option, then the hccme and cluster adjustment does not apply to the year fixed effects.
Following is an example using these two approaches in PROC PANEL to specify fixed effects on industry and year and obtain cluster adjustment on firm level. Note that HCCME = 1 in the code is only an example. You can choose HCCME = 0, 1, 2, 3 to be used together with the CLUSTER option as discussed in the above usage note.
/*method 1: specify industry_id and year in CLASS and MODEL statements, and specify
POOLED option */
ods select parameterestimates;
proc panel data=testset;
id firm_id year;
class Industry_id year ;
model Y = X1 X2 X3 Industry_id year/pooled hccme=1 cluster ;
run;
/*method 2: specify industry_id only in the CLASS and MODEL statements, specify FIXONETIME option in MODEL statement*/
ods select parameterestimates;
proc panel data=testset;
id firm_id year;
class Industry_id ;
model Y = X1 X2 X3 Industry_id /fixonetime printfixed hccme=1 cluster;
run;
I hope this helps.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.