Hi I am trying to get an output which is similar to creating lags but lag functions doesn't give me desired output. Please suggest how to get the output like shown below.
Mainly I want to create a new variable "l_no" which looks back to last year value from "no" variable by id and yr variable (if no value available for last year then missing).
data sin_x;
input id yr no;
datalines;
1 2015 41
1 2014 2
2 2012 3
2 2011 4
3 2010 7
3 2009 0
3 2008 4
4 2007 0
4 2006 7
;
Desired output:
id yr no l_no
1 2015 41 2
1 2014 2 .
2 2012 3 4
2 2011 4 .
3 2010 7 0
3 2009 0 4
3 2008 4 .
4 2007 0 7
4 2006 7 .
Appreciate your help.
data sin_x; input id yr no; datalines; 1 2015 41 1 2014 2 2 2012 3 2 2011 4 3 2010 7 3 2009 0 3 2008 4 4 2007 0 4 2006 7 ; run; data want; if _n_=1 then do; if 0 then set sin_x(rename=(yr=_yr no=i_no)); declare hash h(dataset:'sin_x(rename=(yr=_yr no=i_no))'); h.definekey('id','_yr'); h.definedata('i_no'); h.definedone(); end; call missing(of _all_); set sin_x; _yr=yr-1; rc=h.find(); drop _: rc; run;
Go for a left join:
proc sql;
create table desired as
select a.id, a.yr, b.no as l_no
from sin_x as a left join sin_x as b on a.id=b.id and a.yr=b.yr+1
order by id, yr desc;
select * from desired;
quit;
If you like a data step program here you go:
The maximum group size is set to 3. This can be changed to the actual max number. It can be programatically found.
data sin_x; input id yr no; datalines; 1 2015 41 1 2014 2 2 2012 3 2 2011 4 3 2010 7 3 2009 0 3 2008 4 4 2007 0 4 2006 7 ; run; %let groupMax = 3; data want; array k[&groupMax] _temporary_; do i = 1 by 1 until(last.id); set sin_x; by id; k[i] = no; end; do j = 1 to dim(k) -1; l_no = k[j + 1]; output; end; call missing(of k[*]); drop i j; run; proc print data = want; run;
Sort data by ID year.
Use Lag, pay attention to id
resort by id descending year
data sin_x; input id yr no; datalines; 1 2015 41 1 2014 2 2 2012 3 2 2011 4 3 2010 7 3 2009 0 3 2008 4 4 2007 0 4 2006 7 ; run; data want; if _n_=1 then do; if 0 then set sin_x(rename=(yr=_yr no=i_no)); declare hash h(dataset:'sin_x(rename=(yr=_yr no=i_no))'); h.definekey('id','_yr'); h.definedata('i_no'); h.definedone(); end; call missing(of _all_); set sin_x; _yr=yr-1; rc=h.find(); drop _: rc; run;
LAG does exactly what you want. You just have your data sorted incorrectly. Also remember to call LAG() on every observation.
proc sort data=have; by id yr; run;
data want ;
set have ;
by id yr ;
lag_no=lag(no);
if not first.id then l_no=lag_no;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.