BookmarkSubscribeRSS Feed
stoffprof
Fluorite | Level 6

I am having trouble figuring out how to estimate a particular type of fixed effects regression. My data looks something like this:

CompanyDateOutputManager 1Manager 2Manager 3
112/31/200250010552291.
13/31/200252010552291.
212/31/2002180229155387721
23/31/2002178229155387721
26/30/200218855387721.
39/30/20027597721..

There are a few thousand companies, each with about 100 observations on different dates. Each company has up to 3 managers, which are identified by a numeric code.

I would like to estimate fixed effects for each manager in a regression of output on a number of controls (not shown). I'm having trouble thinking about how to do this given the structure of the data. It's important to notice that the particular column where a manager ID is listed has no intrinsic meaning -- it's just a list of which managers are at the company, and there can be up to 3 at a time. (They're sorted, so if a lower-ID manager leaves, all the IDs shift to the left, as happens for the third observation of company 2 above.)

There are about 2,000 different manager IDs, so there are many fixed effects. (That is, the numbers 1055, 2291, 5538, and 7721 are just four of the 2,000 possible values.)

I've been trying to think of how to estimate such a model. One approach would be to manually create a dummy variable matrix in SAS/IML and run the regression, but I'm running into memory problems. To be clear, for the data above, the dummy variables would look like this:

mgr1055mgr3765mgr5538mgr7721
1100
1100
1011
1011
0011
0001

The other way I thought might work would be to reshape the data so that it's "long", so that, for example, the first observation would generate two observations:

CompanyDateOutputManager
112/31/20025001055
112/31/20025002291

This would easily generate the fixed effects, but the standard errors would be wrong because the regression doesn't "know" that this is just one observation. Perhaps clustering could fix this, but it's not clear to me if the regression would be correctly specified.

So... is there a way to do this with PROC GLM or some other procedure? Any guidance on this would be much appreciated!

4 REPLIES 4
PGStats
Opal | Level 21

You could build on something like this:

data have(where=(not missing(manager)));

dum = 1;

input Company Date :mmddyy10. Output Manager_1 Manager_2 Manager_3;

format date .;

lOutput = log10(output);

manager = manager_1; output;

manager = manager_2; output;

manager = manager_3; output;

keep company date lOutput manager dum;

datalines;

1 12/31/2002 500 1055 2291 .

1 3/31/2002 520 1055 2291 .

2 12/31/2002 180 2291 5538 7721

2 3/31/2002 178 2291 5538 7721

2 6/30/2002 188 5538 7721 .

3 9/30/2002 759 7721 . .

;

proc transpose data=have out=wide(drop=_name_) prefix=mgr;

by company date lOutput notsorted;

var dum;

id manager;

run;

data wide;

set wide;

array mgr{*} mgr:;

do i = 1 to dim(mgr);

  if missing(mgr{i}) then mgr{i} = 0;

  end;

year = year(date);

drop i;

run;

proc glm data=wide;

class company year;

model lOutput =  company year mgr: / ss3 solution;

run;

PG

PG
Dm95
Calcite | Level 5

Hello,

 

Sorry to ask question directly to your responses, but I am unable to ask a new question because every time I try, it asks me to select a board but there are no options to select a board. 

 

In regards to my question, I am wanting to run a multiple regression, however, for two of the variables, I need to convert them into fixed effects and I am unsure on how to do that. The two variables are Year and SIC2. Below is my sample code. 

 

proc reg data=auditdata;
model Ln_Fees=Auditlag logat Aud_Change INVET REC Loss Big4 Busy Influence Audit_Tenure LEV LOSS City_ind_fees ROA bus_seg geo_seg Year SIC2 / COLLIN;
Run;

Again, sorry to ask questions directly to your responses, but I'm not sure on what else to do.

 

Thank you!

StatsMan
SAS Super FREQ

PROC REG only does multiple linear regression with continuous effects. If you have classification effects, effects that you need to create dummy columns for in your design matrix, then you need to use a procedure that has a CLASS statement. If your model only involves fixed effects, then GLM's CLASS statement will handle those for you. Your fixed effect variable names and your continuous variables both go on the MODEL statement. So if A is a fixed classification effect and X1, X2 are continuous effects you would have

 

proc glm data=yourdata;

   class A;

   model y=A X1 X2;

run;

 

There is no collinearity check in GLM. You can still run your PROC REG model with just the continuous effects to check for collinearity in them. Check out the GLM documentation for information on the syntax for that procedure. The options available are different than those options that would be appropriate for a model that only has continuous effects (PROC REG)

StatDave
SAS Super FREQ

Is it really true that a manager (like 2291) can be in more than one company?? That seems unlikely. The important thing is to define which output values are in a cluster, where the output values in a cluster are assumed to be correlated. Are all output values in a company correlated and are therefore a cluster? 

 

Also important is what you want this model to answer - do you want to estimate or test something regarding managers? if so, one or more output values should be clearly associated in your data with each manager. 

 

Another important issue is the distribution of the output values - is it approximately normal, or is it skewed like gamma, etc. This will help choose the procedure to use.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3723 views
  • 7 likes
  • 5 in conversation