Hi I am trying to get an output which is similar to creating lags but lag functions doesn't give me desired output. Please suggest how to get the output like shown below.
Mainly I want to create a new variable "l_no" which looks back to last year value from "no" variable by id and yr variable (if no value available for last year then missing).
data sin_x;
input id yr no;
datalines;
1 2015 41
1 2014 2
2 2012 3
2 2011 4
3 2010 7
3 2009 0
3 2008 4
4 2007 0
4 2006 7
;
Desired output:
id yr no l_no
1 2015 41 2
1 2014 2 .
2 2012 3 4
2 2011 4 .
3 2010 7 0
3 2009 0 4
3 2008 4 .
4 2007 0 7
4 2006 7 .
Appreciate your help.
data sin_x; input id yr no; datalines; 1 2015 41 1 2014 2 2 2012 3 2 2011 4 3 2010 7 3 2009 0 3 2008 4 4 2007 0 4 2006 7 ; run; data want; if _n_=1 then do; if 0 then set sin_x(rename=(yr=_yr no=i_no)); declare hash h(dataset:'sin_x(rename=(yr=_yr no=i_no))'); h.definekey('id','_yr'); h.definedata('i_no'); h.definedone(); end; call missing(of _all_); set sin_x; _yr=yr-1; rc=h.find(); drop _: rc; run;
Go for a left join:
proc sql;
create table desired as
select a.id, a.yr, b.no as l_no
from sin_x as a left join sin_x as b on a.id=b.id and a.yr=b.yr+1
order by id, yr desc;
select * from desired;
quit;
If you like a data step program here you go:
The maximum group size is set to 3. This can be changed to the actual max number. It can be programatically found.
data sin_x; input id yr no; datalines; 1 2015 41 1 2014 2 2 2012 3 2 2011 4 3 2010 7 3 2009 0 3 2008 4 4 2007 0 4 2006 7 ; run; %let groupMax = 3; data want; array k[&groupMax] _temporary_; do i = 1 by 1 until(last.id); set sin_x; by id; k[i] = no; end; do j = 1 to dim(k) -1; l_no = k[j + 1]; output; end; call missing(of k[*]); drop i j; run; proc print data = want; run;
Sort data by ID year.
Use Lag, pay attention to id
resort by id descending year
data sin_x; input id yr no; datalines; 1 2015 41 1 2014 2 2 2012 3 2 2011 4 3 2010 7 3 2009 0 3 2008 4 4 2007 0 4 2006 7 ; run; data want; if _n_=1 then do; if 0 then set sin_x(rename=(yr=_yr no=i_no)); declare hash h(dataset:'sin_x(rename=(yr=_yr no=i_no))'); h.definekey('id','_yr'); h.definedata('i_no'); h.definedone(); end; call missing(of _all_); set sin_x; _yr=yr-1; rc=h.find(); drop _: rc; run;
LAG does exactly what you want. You just have your data sorted incorrectly. Also remember to call LAG() on every observation.
proc sort data=have; by id yr; run;
data want ;
set have ;
by id yr ;
lag_no=lag(no);
if not first.id then l_no=lag_no;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.