I use the following code for clustering standard errors for industry and year.
Proc Surveyreg data=have;
cluster industry year;
class industry year;
model y= independent_variables industry year/solution;
run;
I have two questions.
1. Why does the last value in the cluster variable either industry or year become zero? In the following picture, the value of the year 2019 becomes zero. Can you elaborate, why does this appears, and if require, how to solve this (Maybe x metric is not fully run)?
2. How can I get the output of proc surveyreg in SAS data format? I want to get just only the parameter estimate of all companies (when I cluster by gvkey_id), especially the value of standard errors. Otherwise, I can do it manually in excel, it actually will take time as lots of companies are there.
Thanks in advance for your valuable comments.
Question 1: this is how SAS (by default) handles regression coefficients for categorical variables. The last category, alphabetically, is assigned a value of zero. It's normal and nothing to worry about. There are many possible parameterizations for this model with categorical variable, and they are all equivalent — same model, same model fit, same predictions. See here for further explanation.
Better than looking at the regression coefficients is to look at the LSMEANS for the categorical variables, this doesn't have the problem of having the last value alphabetically set to zero, and is more easily interpretable.
Question 2: What output are you talking about? Regression coefficients or predicted values, or something else? PROC SURVEYREG has output data sets, and any table can be turned into a SAS data set via ODS OUTPUT. The documentation has all the details.
Question 1: this is how SAS (by default) handles regression coefficients for categorical variables. The last category, alphabetically, is assigned a value of zero. It's normal and nothing to worry about. There are many possible parameterizations for this model with categorical variable, and they are all equivalent — same model, same model fit, same predictions. See here for further explanation.
Better than looking at the regression coefficients is to look at the LSMEANS for the categorical variables, this doesn't have the problem of having the last value alphabetically set to zero, and is more easily interpretable.
Question 2: What output are you talking about? Regression coefficients or predicted values, or something else? PROC SURVEYREG has output data sets, and any table can be turned into a SAS data set via ODS OUTPUT. The documentation has all the details.
@PaigeMiller Sir, beautifully explained. Thank you. I have noted too! Cheers!
Hello PaigeMiller, I have a query. I want to capture annual and industry fixed effects that's why I am using code:
Proc Surveyreg data=have;
cluster industry year;
class industry year;
model y= independent_variables industry year/solution;
run;
However, I want to get the standard errors for each firm, if I use the following does it capture the fixed effect of year and industry?
Proc Surveyreg data=have;
cluster gvkey;
class industry year;
model y= independent_variables industry year/solution;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.