BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have a dataset with a categorical variable with hundreds of values, many dummy variables, and a continuous variable. I'm trying to create a regression model with the continuous variable as the dependent variable and the dummies/categorical variable as the independent variables, and include robust standard errors in the output. I know two ways to create linear regression models in SAS: proc glm can convert the categorical var to dummies and suppress the output of the different levels, but from what I can tell it can't produce robust standard errors. Proc reg can get me the robust SEs, but can't deal with the categorical variable. Is there some sort of workaround, or even an alternative procedure that would yield everything I need in one step?

 

Any help is much appreciated.

6 REPLIES 6
Rick_SAS
SAS Super FREQ

What options are you using to get "robust standard errors" in PROC REG? As an OLS procedure, most REG estimates are not robust.

 

I can think of a two possibilities:

1. Use PROC GLMMOD to generate the dummy variables from the categorical variables. Then use those dummy variables in PROC REG.

2. Switch to a procedure such as PROC ROBUSTREG that supports the class statements and performs robust regression computations with estimates for standard errors.

 

proc robustreg data=sashelp.cars;
class origin;
model mpg_city = origin / ;
run;
Walternate
Obsidian | Level 7

Thank you for your response! I tried both of the methods you suggested, but the models were taking forever to run because unlike proc glm, they couldn't "absorb" the categorical variable with hundreds of values, so SAS is struggling to calculate coefficients for all of the dummies from the categorical variable even though I don't need to know the coefficents for each level of the categorical var.

Rick_SAS
SAS Super FREQ

1. What options are you using to get "robust standard errors" in PROC REG?

2. How are you using the regression model if you don't know the coefficients for each level? To score the model (make a prediction) you need the coefficients, so what are you trying to accomplish?

Walternate
Obsidian | Level 7

I've only used a proc glm so far. I have just read that it is possible to get robust SEs in proc reg using some option (it's /white or something to that effect). What I've done so far is a proc glm statement like this:

 

proc glm data=mydata;

absorb categ_var;

model dependent=indep_var1 indep_var2/solution noint;

run;

 

This gives me estimates of the coefficients of the indep_vars, it does NOT give me coefficients for all the values of categ_var (which I don't need anyway), but takes categ_var into account in the calculations. The only difference between the output from that step and what I need is that from what I can tell, there's no way to get robust SEs in proc glm.

 

PROC REG and PROC SURVEYREG have class statements but not absorb statements, so the output includes coefficients for all the levels of the categorical variable, which I don't need and which makes the model take a much longer time to run.

Rick_SAS
SAS Super FREQ

From your syntax, I assume that Categ_Var is a variable that identifies individuals for which you have repeated effects.

 

 

If you are adamant about following the ABSORB approach, you can manually adjust the data by subtracting the average value of the dependent and independent variables over each level of Categ_Var, as shown in Allison (2006), "Fixed Effects Regression Methods in SAS", p. 4-6.

Since the data are already sorted by Categ_Var, you can run PROC MEANS with

BY categ_var;

then use a DATA step to merge the output with your data and subtract the means.

 

Others experts might suggest alternative SAS procedures that can handle this type of data without resorting to the "absorb" method.

 

Ksharp
Super User
You could use PROC GLMSELECT to get the most significant variables,
PROC GLMSELECT can handle hundreds and thousands of variables.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 4398 views
  • 0 likes
  • 3 in conversation