## Diff in diff in panel data

Occasional Contributor
Posts: 11

# Diff in diff in panel data

Hi everyone,

I have a database with workers (id) in firms (worker x id), their salary, the year of the observation, as well as a variable displaced equal to one if they will be displaced from the firm the year after (if displaced=1 in 2002, the worker is displaced from the firm in 2003. Displaced has been created based on the disappearance of the worker from the panel. The salary in 2003 at year t will be thus of 0).

id       firm x id         displaced     year       salary   Salary t-1     Salary t-2

1         12                        0            2002       2000            .

1         12                        1            2003        1500          2000               .

2         22                         0            2002        560           .                      .

2         22                         0            2003        580            .                     .

2         22                         1            2004         600          580               560

I want to do a diff in diff regression over several years to estimate the effect of displacement on revenue. The problem is I want to take t (salary at displacement year) as the dependent variable of the regression, but also do the regression on passed years' salary to see the influence of displacement at year t on revenue at year t-1 for example. I thus have to create, for each individual displaced  at year t with the salary t, a variable in the same row indicating the revenu in t-1, t-2, t-3 etc. Salary t-1 and Salary t-2 in the table above are the variables I look for.

Do you know if it is at least possible to do something like this?

I would be very grateful for help!!!

Eugénie

Posts: 1,309

## Re: Diff in diff in panel data

I assume (a) you have annual records for each ID, (b) every ID has a displacement, and (c)  the year of displacement is always the last record for a given ID.  Then this will work:

``````data want;

do N_years=1 by 1 until (last.id);
set have ;
by id notsorted;
sal_T_minus_1=lag(salary);
sal_T_minus_2=lag2(salary);
sal_T_minus_3=lag3(salary);
sal_T_minus_4=lag4(salary);
end;

array sal{4} sal_T_minus_1-sal_T_minus_4;

if N_years<=4 then do N=N_years to 4;
sal{N}=.;
end;
drop N;

run;``````

1. This program writes out only 1 record per ID, since your regression only needs the data you create for the last within-id record.

2. If an ID is short (has few records) then some of the lagged salary values will be contaminated by the prior ID.  For instance, if you have only 2 years, then the 3 year lag and 4 year lag will be contaminated.  This is corrected in the "if N_years<4 then do N=N_years+1 to 4" loop (and yes, it is possible to use the variable N_years not only as the loop index, but also a bound on the index).
3. If you want more than 4 pre-displaced years then
1. add LAG statements in the first DO loop
2. increase the size of the SAL array
3. Modify the bounds in the second DO loop
4. In addition to salary history, you also have an additional variable - N_years - which is a count of annual records for each ID.
Occasional Contributor
Posts: 11