Programming the statistical procedures from SAS

Regression with multiple fixed effects per observation

Reply
Occasional Contributor
Posts: 16

Regression with multiple fixed effects per observation

I am having trouble figuring out how to estimate a particular type of fixed effects regression. My data looks something like this:

CompanyDateOutputManager 1Manager 2Manager 3
112/31/200250010552291.
13/31/200252010552291.
212/31/2002180229155387721
23/31/2002178229155387721
26/30/200218855387721.
39/30/20027597721..

There are a few thousand companies, each with about 100 observations on different dates. Each company has up to 3 managers, which are identified by a numeric code.

I would like to estimate fixed effects for each manager in a regression of output on a number of controls (not shown). I'm having trouble thinking about how to do this given the structure of the data. It's important to notice that the particular column where a manager ID is listed has no intrinsic meaning -- it's just a list of which managers are at the company, and there can be up to 3 at a time. (They're sorted, so if a lower-ID manager leaves, all the IDs shift to the left, as happens for the third observation of company 2 above.)

There are about 2,000 different manager IDs, so there are many fixed effects. (That is, the numbers 1055, 2291, 5538, and 7721 are just four of the 2,000 possible values.)

I've been trying to think of how to estimate such a model. One approach would be to manually create a dummy variable matrix in SAS/IML and run the regression, but I'm running into memory problems. To be clear, for the data above, the dummy variables would look like this:

mgr1055mgr3765mgr5538mgr7721
1100
1100
1011
1011
0011
0001

The other way I thought might work would be to reshape the data so that it's "long", so that, for example, the first observation would generate two observations:

CompanyDateOutputManager
112/31/20025001055
112/31/20025002291

This would easily generate the fixed effects, but the standard errors would be wrong because the regression doesn't "know" that this is just one observation. Perhaps clustering could fix this, but it's not clear to me if the regression would be correctly specified.

So... is there a way to do this with PROC GLM or some other procedure? Any guidance on this would be much appreciated!

Respected Advisor
Posts: 4,756

Re: Regression with multiple fixed effects per observation

You could build on something like this:

data have(where=(not missing(manager)));

dum = 1;

input Company Date :mmddyy10. Output Manager_1 Manager_2 Manager_3;

format date .;

lOutput = log10(output);

manager = manager_1; output;

manager = manager_2; output;

manager = manager_3; output;

keep company date lOutput manager dum;

datalines;

1 12/31/2002 500 1055 2291 .

1 3/31/2002 520 1055 2291 .

2 12/31/2002 180 2291 5538 7721

2 3/31/2002 178 2291 5538 7721

2 6/30/2002 188 5538 7721 .

3 9/30/2002 759 7721 . .

;

proc transpose data=have out=wide(drop=_name_) prefix=mgr;

by company date lOutput notsorted;

var dum;

id manager;

run;

data wide;

set wide;

array mgr{*} mgr:;

do i = 1 to dim(mgr);

  if missing(mgr{i}) then mgr{i} = 0;

  end;

year = year(date);

drop i;

run;

proc glm data=wide;

class company year;

model lOutput =  company year mgr: / ss3 solution;

run;

PG

PG
Ask a Question
Discussion stats
  • 1 reply
  • 266 views
  • 3 likes
  • 2 in conversation