turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Regression with multiple fixed effects per observa...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-16-2014 09:09 PM

I am having trouble figuring out how to estimate a particular type of fixed effects regression. My data looks something like this:

Company | Date | Output | Manager 1 | Manager 2 | Manager 3 |
---|---|---|---|---|---|

1 | 12/31/2002 | 500 | 1055 | 2291 | . |

1 | 3/31/2002 | 520 | 1055 | 2291 | . |

2 | 12/31/2002 | 180 | 2291 | 5538 | 7721 |

2 | 3/31/2002 | 178 | 2291 | 5538 | 7721 |

2 | 6/30/2002 | 188 | 5538 | 7721 | . |

3 | 9/30/2002 | 759 | 7721 | . | . |

There are a few thousand companies, each with about 100 observations on different dates. Each company has up to 3 managers, which are identified by a numeric code.

I would like to estimate fixed effects for each manager in a regression of output on a number of controls (not shown). I'm having trouble thinking about how to do this given the structure of the data. It's important to notice that the particular column where a manager ID is listed has no intrinsic meaning -- it's just a list of which managers are at the company, and there can be up to 3 at a time. (They're sorted, so if a lower-ID manager leaves, all the IDs shift to the left, as happens for the third observation of company 2 above.)

There are about 2,000 different manager IDs, so there are many fixed effects. (That is, the numbers 1055, 2291, 5538, and 7721 are just four of the 2,000 possible values.)

I've been trying to think of how to estimate such a model. One approach would be to manually create a dummy variable matrix in SAS/IML and run the regression, but I'm running into memory problems. To be clear, for the data above, the dummy variables would look like this:

mgr1055 | mgr3765 | mgr5538 | mgr7721 |
---|---|---|---|

1 | 1 | 0 | 0 |

1 | 1 | 0 | 0 |

1 | 0 | 1 | 1 |

1 | 0 | 1 | 1 |

0 | 0 | 1 | 1 |

0 | 0 | 0 | 1 |

The other way I thought might work would be to reshape the data so that it's "long", so that, for example, the first observation would generate two observations:

Company | Date | Output | Manager |
---|---|---|---|

1 | 12/31/2002 | 500 | 1055 |

1 | 12/31/2002 | 500 | 2291 |

This would easily generate the fixed effects, but the standard errors would be wrong because the regression doesn't "know" that this is just one observation. Perhaps clustering could fix this, but it's not clear to me if the regression would be correctly specified.

So... is there a way to do this with PROC GLM or some other procedure? Any guidance on this would be much appreciated!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-16-2014 11:03 PM

You could build on something like this:

**data have(where=(not missing(manager)));**

**dum = 1;**

**input Company Date :mmddyy10. Output Manager_1 Manager_2 Manager_3;**

**format date .;**

**lOutput = log10(output);**

**manager = manager_1; output;**

**manager = manager_2; output;**

**manager = manager_3; output;**

**keep company date lOutput manager dum;**

**datalines;**

**1 12/31/2002 500 1055 2291 .**

**1 3/31/2002 520 1055 2291 .**

**2 12/31/2002 180 2291 5538 7721**

**2 3/31/2002 178 2291 5538 7721**

**2 6/30/2002 188 5538 7721 .**

**3 9/30/2002 759 7721 . .**

**;**

**proc transpose data=have out=wide(drop=_name_) prefix=mgr;**

**by company date lOutput notsorted;**

**var dum;**

**id manager;**

**run;**

**data wide;**

**set wide;**

**array mgr{*} mgr:;**

**do i = 1 to dim(mgr);**

** if missing(mgr{i}) then mgr{i} = 0;**

** end;**

**year = year(date);**

**drop i;**

**run;**

**proc glm data=wide;**

**class company year;**

**model lOutput = company year mgr: / ss3 solution;**

**run;**

PG

PG