turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Linear regression in SAS with robust SEs and large...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-23-2016 08:41 AM

Hi,

I have a dataset with a categorical variable with hundreds of values, many dummy variables, and a continuous variable. I'm trying to create a regression model with the continuous variable as the dependent variable and the dummies/categorical variable as the independent variables, and include robust standard errors in the output. I know two ways to create linear regression models in SAS: proc glm can convert the categorical var to dummies and suppress the output of the different levels, but from what I can tell it can't produce robust standard errors. Proc reg can get me the robust SEs, but can't deal with the categorical variable. Is there some sort of workaround, or even an alternative procedure that would yield everything I need in one step?

Any help is much appreciated.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-23-2016 09:20 AM

What options are you using to get "robust standard errors" in PROC REG? As an OLS procedure, most REG estimates are not robust.

I can think of a two possibilities:

1. Use PROC GLMMOD to generate the dummy variables from the categorical variables. Then use those dummy variables in PROC REG.

2. Switch to a procedure such as PROC ROBUSTREG that supports the class statements and performs robust regression computations with estimates for standard errors.

```
proc robustreg data=sashelp.cars;
class origin;
model mpg_city = origin / ;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-23-2016 01:04 PM

Thank you for your response! I tried both of the methods you suggested, but the models were taking forever to run because unlike proc glm, they couldn't "absorb" the categorical variable with hundreds of values, so SAS is struggling to calculate coefficients for all of the dummies from the categorical variable even though I don't need to know the coefficents for each level of the categorical var.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-23-2016 01:08 PM

1. What options are you using to get "robust standard errors" in PROC REG?

2. How are you using the regression model if you don't know the coefficients for each level? To score the model (make a prediction) you need the coefficients, so what are you trying to accomplish?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-23-2016 01:14 PM

I've only used a proc glm so far. I have just read that it is possible to get robust SEs in proc reg using some option (it's /white or something to that effect). What I've done so far is a proc glm statement like this:

proc glm data=mydata;

absorb categ_var;

model dependent=indep_var1 indep_var2/solution noint;

run;

This gives me estimates of the coefficients of the indep_vars, it does NOT give me coefficients for all the values of categ_var (which I don't need anyway), but takes categ_var into account in the calculations. The only difference between the output from that step and what I need is that from what I can tell, there's no way to get robust SEs in proc glm.

PROC REG and PROC SURVEYREG have class statements but not absorb statements, so the output includes coefficients for all the levels of the categorical variable, which I don't need and which makes the model take a much longer time to run.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-23-2016 02:05 PM

From your syntax, I assume that Categ_Var is a variable that identifies individuals for which you have repeated effects.

If you are adamant about following the ABSORB approach, you can manually adjust the data by subtracting the average value of the dependent and independent variables over each level of Categ_Var, as shown in Allison (2006), "Fixed Effects Regression Methods in SAS", p. 4-6.

Since the data are already sorted by Categ_Var, you can run PROC MEANS with

BY categ_var;

then use a DATA step to merge the output with your data and subtract the means.

Others experts might suggest alternative SAS procedures that can handle this type of data without resorting to the "absorb" method.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-23-2016 11:20 PM

You could use PROC GLMSELECT to get the most significant variables, PROC GLMSELECT can handle hundreds and thousands of variables.