Diff in diff in panel data

Occasional Contributor
Posts: 11

Diff in diff in panel data

Hi everyone, 


I have a database with workers (id) in firms (worker x id), their salary, the year of the observation, as well as a variable displaced equal to one if they will be displaced from the firm the year after (if displaced=1 in 2002, the worker is displaced from the firm in 2003. Displaced has been created based on the disappearance of the worker from the panel. The salary in 2003 at year t will be thus of 0).


id       firm x id         displaced     year       salary   Salary t-1     Salary t-2

1         12                        0            2002       2000            .

1         12                        1            2003        1500          2000               .

2         22                         0            2002        560           .                      .

2         22                         0            2003        580            .                     .        

2         22                         1            2004         600          580               560


I want to do a diff in diff regression over several years to estimate the effect of displacement on revenue. The problem is I want to take t (salary at displacement year) as the dependent variable of the regression, but also do the regression on passed years' salary to see the influence of displacement at year t on revenue at year t-1 for example. I thus have to create, for each individual displaced  at year t with the salary t, a variable in the same row indicating the revenu in t-1, t-2, t-3 etc. Salary t-1 and Salary t-2 in the table above are the variables I look for.


Do you know if it is at least possible to do something like this? 


I would be very grateful for help!!!

Thank you in advance, 






Trusted Advisor
Posts: 1,309

Re: Diff in diff in panel data

Posted in reply to eugenia67

I assume (a) you have annual records for each ID, (b) every ID has a displacement, and (c)  the year of displacement is always the last record for a given ID.  Then this will work:


data want;

  do N_years=1 by 1 until (last.id);
    set have ;
    by id notsorted;

  array sal{4} sal_T_minus_1-sal_T_minus_4;

  if N_years<=4 then do N=N_years to 4;
  drop N;



  1. This program writes out only 1 record per ID, since your regression only needs the data you create for the last within-id record.

  2. If an ID is short (has few records) then some of the lagged salary values will be contaminated by the prior ID.  For instance, if you have only 2 years, then the 3 year lag and 4 year lag will be contaminated.  This is corrected in the "if N_years<4 then do N=N_years+1 to 4" loop (and yes, it is possible to use the variable N_years not only as the loop index, but also a bound on the index).
  3. If you want more than 4 pre-displaced years then
    1. add LAG statements in the first DO loop
    2. increase the size of the SAL array
    3. Modify the bounds in the second DO loop
  4. In addition to salary history, you also have an additional variable - N_years - which is a count of annual records for each ID.
Occasional Contributor
Posts: 11

Re: Diff in diff in panel data

Posted in reply to eugenia67
Hi Mark ! Thank you so much, I will try this! Actually the displacement year is not the last record of each id infortunetaly, because an id can find a work in a new firm the next year or even several years after and reappear in the panel .. nevertheless I can use id x firm identifier as if a worker is displaced the code id x identifier will appear for the last time. I just have to check if this id is in the firm for at lest 4 years to do that ..

I already used the id x firm code to identify displaced id that are reemployed the next year. I compared the last.idxfirm year to the last.id year and if these two are different in two year (the first year being the displacement year) then the worker is reemplyed. Maybe this info will help me!

I will also have to do the same exercise for salary_T_plus_1 salaryT_plus_2 etc until 4 (the year when the id does not appear in the record because he is not employed will be marked as "0 salary".. I have thus to replace lag by what ?

Thank you so much !
Ask a Question
Discussion stats
  • 2 replies
  • 2 in conversation