Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Fixed effect with clustered standard errors? proc glm?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-18-2019 10:35 AM
(5158 views)

Dear All,

I was wondering how I can run a fixed-effect regression with standard errors being clustered. I have a panel data of individuals being observed multiple times. I would like to run the regression with the individual fixed effects and standard errors being clustered by individuals. Since I have more than several thousands of individuals, CLASS statement with PROC SURVEYREG is really inefficient, and SAS says insufficient memory. So I don't think I can use PROC SURVEYREG.

Can I achieve this using proc glm or proc model? I searched, but didn't find a clear way to do so. Thanks in advance.

13 REPLIES 13

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Maybe PROC GLM with a WEIGHT statement? https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statu...

From the documentation: "If the weights for the observations are proportional to the reciprocals of the error variances, then the weighted least squares estimates are best linear unbiased estimators (BLUE)"

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Isn't WLS about heteroscedasticity (i.e., variance) while clustering standard errors is about covariance within a unit (having multiple observations)? I think they are two different issues.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

How are you thinking about including cluster in any model you would fit?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm not sure if I understand your suggestion.

What I would like to do is to include IDs as fixed effects and get standard errors clustered by IDs at the same time. I know it's possible with PROC SURVEYREG, but when I have many ID values, it's practically impossible. So I'm looking for another procedure.

What I would like to do is to include IDs as fixed effects and get standard errors clustered by IDs at the same time. I know it's possible with PROC SURVEYREG, but when I have many ID values, it's practically impossible. So I'm looking for another procedure.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@braam wrote:

... and get standard errors clustered by IDs at the same time.

Now this implies that the standard errors clustered by IDs are the output of the regression. Is that correct? I thought the standard errors were inputs to a regression.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Sorry for the confusion. Yes, I would like to 1) have clustered standard errors and 2) include individual-fixed effects.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Show us the SURVEYREG code you were thinking of using, even if it doesn't work because there's too many individuals.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is the code that you requested. In this example, having too many values for Origin would make this type of regression really inefficient, which takes more than several hours for my case/data.

The below is GLM code where I cannot cluster standard errors. I also absorb Origin, rather than estimating its fixed effects. I actually expected the same coefficients on Cylinders from these two approaches, but they are not, which is strange to me.

```
proc surveyreg data= sashelp.cars;
cluster Origin;
class Origin Type;
model EngineSize= Cylinders Origin Type/ solution;
run;
proc glm data= sashelp.cars;
absorb Origin;
class Type;
model EngineSize= Cylinders Type/ solution;
run;
```

SURVEYREG RESULT

Estimated Regression Coefficients |
||||
---|---|---|---|---|

Parameter | Estimate | Standard Error | t Value | Pr > |t| |

Intercept | -0.2423962 | 0.24823069 | -0.98 | 0.4318 |

Cylinders | 0.6195316 | 0.03299998 | 18.77 | 0.0028 |

Origin Asia | -0.2473363 | 0.02963121 | -8.35 | 0.0141 |

Origin Europe | -0.4510775 | 0.00538821 | -83.72 | 0.0001 |

Origin USA | 0.0000000 | 0.00000000 | . | . |

Type Hybrid | -0.1485498 | 0.10737472 | -1.38 | 0.3007 |

Type SUV | 0.2723754 | 0.09245885 | 2.95 | 0.0985 |

Type Sedan | -0.0206628 | 0.05296500 | -0.39 | 0.7341 |

Type Sports | 0.1480223 | 0.17265540 | 0.86 | 0.4816 |

Type Truck | 0.5319361 | 0.11385004 | 4.67 | 0.0429 |

Type Wagon | 0.0000000 | 0.00000000 | . | . |

GLM RESULT

Parameter | Estimate | Standard Error | t Value | Pr > |t| | |
---|---|---|---|---|---|

Cylinders | 0.6292556337 | 0.01473441 | 42.71 | <.0001 | |

Type Hybrid | -.1535480401 | B | 0.23825545 | -0.64 | 0.5196 |

Type SUV | 0.2436920500 | B | 0.08982120 | 2.71 | 0.0070 |

Type Sedan | -.0144629620 | B | 0.07368536 | -0.20 | 0.8445 |

Type Sports | 0.0949267303 | B | 0.09199753 | 1.03 | 0.3028 |

Type Truck | 0.4970593441 | B | 0.10899489 | 4.56 | <.0001 |

Type Wagon | 0.0000000000 | B | . | . | . |

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This seems to be a problem that I will have to think about, as I don't see an obvious path forward right now. Large number of levels of any class variable do cause this problem where you don't have enough memory or it takes a huge long time.

How were you going to handle the issue that SAS always assigns a standard error of zero to one (or more) of the class levels?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To get the same parameter estimates, you need to specify NOINT in the SURVEYREG procedure:

```
proc sort data=sashelp.cars out=cars;
by Origin;
run;
proc surveyreg data=cars;
cluster Origin;
class Origin Type;
model EngineSize= Cylinders Origin Type/ noint solution;
ods select parameterestimates;
run;
proc glm data=cars;
absorb Origin;
class Type;
model EngineSize= Cylinders Type/ solution;
ods select parameterestimates;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks! I confirmed it! One thing that is interesting to me is that the coefficient on Cylinders is 0.619 in both ways, but their t-stat varies a lot. For surveyreg, t-stat is 18.77 while for glm, t-stat is 46.32.

Is it because absorbing fixed-effects (conceptually demeaning) influences variance-covariance matrix?

Is it because absorbing fixed-effects (conceptually demeaning) influences variance-covariance matrix?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have panel data ,Try post it at Forecast forum. also try PROC PANEL .

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.