Programming the statistical procedures from SAS

Linear regression in SAS with robust SEs and large categorical vars

Reply
Frequent Contributor
Posts: 138

Linear regression in SAS with robust SEs and large categorical vars

Hi,

 

I have a dataset with a categorical variable with hundreds of values, many dummy variables, and a continuous variable. I'm trying to create a regression model with the continuous variable as the dependent variable and the dummies/categorical variable as the independent variables, and include robust standard errors in the output. I know two ways to create linear regression models in SAS: proc glm can convert the categorical var to dummies and suppress the output of the different levels, but from what I can tell it can't produce robust standard errors. Proc reg can get me the robust SEs, but can't deal with the categorical variable. Is there some sort of workaround, or even an alternative procedure that would yield everything I need in one step?

 

Any help is much appreciated.

SAS Super FREQ
Posts: 3,306

Re: Linear regression in SAS with robust SEs and large categorical vars

What options are you using to get "robust standard errors" in PROC REG? As an OLS procedure, most REG estimates are not robust.

 

I can think of a two possibilities:

1. Use PROC GLMMOD to generate the dummy variables from the categorical variables. Then use those dummy variables in PROC REG.

2. Switch to a procedure such as PROC ROBUSTREG that supports the class statements and performs robust regression computations with estimates for standard errors.

 

proc robustreg data=sashelp.cars;
class origin;
model mpg_city = origin / ;
run;
Frequent Contributor
Posts: 138

Re: Linear regression in SAS with robust SEs and large categorical vars

Thank you for your response! I tried both of the methods you suggested, but the models were taking forever to run because unlike proc glm, they couldn't "absorb" the categorical variable with hundreds of values, so SAS is struggling to calculate coefficients for all of the dummies from the categorical variable even though I don't need to know the coefficents for each level of the categorical var.

SAS Super FREQ
Posts: 3,306

Re: Linear regression in SAS with robust SEs and large categorical vars

1. What options are you using to get "robust standard errors" in PROC REG?

2. How are you using the regression model if you don't know the coefficients for each level? To score the model (make a prediction) you need the coefficients, so what are you trying to accomplish?

Frequent Contributor
Posts: 138

Re: Linear regression in SAS with robust SEs and large categorical vars

I've only used a proc glm so far. I have just read that it is possible to get robust SEs in proc reg using some option (it's /white or something to that effect). What I've done so far is a proc glm statement like this:

 

proc glm data=mydata;

absorb categ_var;

model dependent=indep_var1 indep_var2/solution noint;

run;

 

This gives me estimates of the coefficients of the indep_vars, it does NOT give me coefficients for all the values of categ_var (which I don't need anyway), but takes categ_var into account in the calculations. The only difference between the output from that step and what I need is that from what I can tell, there's no way to get robust SEs in proc glm.

 

PROC REG and PROC SURVEYREG have class statements but not absorb statements, so the output includes coefficients for all the levels of the categorical variable, which I don't need and which makes the model take a much longer time to run.

SAS Super FREQ
Posts: 3,306

Re: Linear regression in SAS with robust SEs and large categorical vars

From your syntax, I assume that Categ_Var is a variable that identifies individuals for which you have repeated effects.

 

 

If you are adamant about following the ABSORB approach, you can manually adjust the data by subtracting the average value of the dependent and independent variables over each level of Categ_Var, as shown in Allison (2006), "Fixed Effects Regression Methods in SAS", p. 4-6.

Since the data are already sorted by Categ_Var, you can run PROC MEANS with

BY categ_var;

then use a DATA step to merge the output with your data and subtract the means.

 

Others experts might suggest alternative SAS procedures that can handle this type of data without resorting to the "absorb" method.

 

Grand Advisor
Posts: 9,447

Re: Linear regression in SAS with robust SEs and large categorical vars

You could use PROC GLMSELECT to get the most significant variables,
PROC GLMSELECT can handle hundreds and thousands of variables.

Ask a Question
Discussion stats
  • 6 replies
  • 376 views
  • 0 likes
  • 3 in conversation