Hi
I wanted to create lags for a few variables in a panel dataset.
It seems the straightforward approach(below) works for me.
data funda1;
set funda;
by gvkey year;
diff = dif (Year);
mve=shares*price
mve_lag=lag(mve);
if first.gvkey then mve_lag=.;
if diff > 1 then mve_lag=.;
tobinq = (mve+at-ceq-txdb)/at;
tobinq_lag=lag(tobinq);
if first.gvkey then tobinq_lag=.;
if diff > 1 then tobinq_lag=.;
at_lag=lag(at);
if first.gvkey then at_lag=.;
if diff > 1 then at_lag=.;
run;
data funda2;
set funda1;
by gvkey year;
if at_lag='.' then delete;
if tobinq_lag='.' then delete;
if mve_lag='.' then delete;
if funded_lag='.' then delete;
capx_s=capx/at_lag;
rd_s=rd/at_lag;
if capx_s ='.' then delete;
if rd_s='.' then delete;
if year lt 1990 then delete;
if year gt 1998 then delete;
if dif(year) ne 1 or gvkey ne lag(gvkey) then group+1;/* this step and proc sql below is to ensure atleast 3 consecutive year observations for all firms in dataset*/
run;
Now in the above code,I have specified the command for deleting missing first lags(for instance:"if at_lag='.' then delete") in the second data step. When I put that statement immediately after every step of lag creation(i.e., after" if diff > 1 then mve_lag=.;"), the first lag of each variable seems to take on the lag_value of the previous gvkey and does not return as '.'.
This might be a very simple question but I am curious as to why does that happen. Could anybody please explain?
Thanks.