09-23-2016 08:41 AM
I have a dataset with a categorical variable with hundreds of values, many dummy variables, and a continuous variable. I'm trying to create a regression model with the continuous variable as the dependent variable and the dummies/categorical variable as the independent variables, and include robust standard errors in the output. I know two ways to create linear regression models in SAS: proc glm can convert the categorical var to dummies and suppress the output of the different levels, but from what I can tell it can't produce robust standard errors. Proc reg can get me the robust SEs, but can't deal with the categorical variable. Is there some sort of workaround, or even an alternative procedure that would yield everything I need in one step?
Any help is much appreciated.
09-23-2016 09:20 AM
What options are you using to get "robust standard errors" in PROC REG? As an OLS procedure, most REG estimates are not robust.
I can think of a two possibilities:
1. Use PROC GLMMOD to generate the dummy variables from the categorical variables. Then use those dummy variables in PROC REG.
2. Switch to a procedure such as PROC ROBUSTREG that supports the class statements and performs robust regression computations with estimates for standard errors.
proc robustreg data=sashelp.cars; class origin; model mpg_city = origin / ; run;
09-23-2016 01:04 PM
Thank you for your response! I tried both of the methods you suggested, but the models were taking forever to run because unlike proc glm, they couldn't "absorb" the categorical variable with hundreds of values, so SAS is struggling to calculate coefficients for all of the dummies from the categorical variable even though I don't need to know the coefficents for each level of the categorical var.
09-23-2016 01:08 PM
1. What options are you using to get "robust standard errors" in PROC REG?
2. How are you using the regression model if you don't know the coefficients for each level? To score the model (make a prediction) you need the coefficients, so what are you trying to accomplish?
09-23-2016 01:14 PM
I've only used a proc glm so far. I have just read that it is possible to get robust SEs in proc reg using some option (it's /white or something to that effect). What I've done so far is a proc glm statement like this:
proc glm data=mydata;
model dependent=indep_var1 indep_var2/solution noint;
This gives me estimates of the coefficients of the indep_vars, it does NOT give me coefficients for all the values of categ_var (which I don't need anyway), but takes categ_var into account in the calculations. The only difference between the output from that step and what I need is that from what I can tell, there's no way to get robust SEs in proc glm.
PROC REG and PROC SURVEYREG have class statements but not absorb statements, so the output includes coefficients for all the levels of the categorical variable, which I don't need and which makes the model take a much longer time to run.
09-23-2016 02:05 PM
From your syntax, I assume that Categ_Var is a variable that identifies individuals for which you have repeated effects.
If you are adamant about following the ABSORB approach, you can manually adjust the data by subtracting the average value of the dependent and independent variables over each level of Categ_Var, as shown in Allison (2006), "Fixed Effects Regression Methods in SAS", p. 4-6.
Since the data are already sorted by Categ_Var, you can run PROC MEANS with
then use a DATA step to merge the output with your data and subtract the means.
Others experts might suggest alternative SAS procedures that can handle this type of data without resorting to the "absorb" method.
09-23-2016 11:20 PM
You could use PROC GLMSELECT to get the most significant variables, PROC GLMSELECT can handle hundreds and thousands of variables.