data test;
input ln $ step $ date1 date9.;
format date1 date9.;
datalines;
0122 B 01jun2019
0122 B 07jun2019
0122 C 06jun2019
;
run;
data test2;
set test;
if step='B' then b_step=date1;
if step='C' then c_step=date1;
format b_step c_step mmddyy10.;
run;
ln | step | date1 | b_step | c_step |
0122 | B | 01Jun2019 | 06/01/2019 | |
0122 | B | 07Jun2019 | 06/07/2019 | |
0122 | C | 06Jun2019 | 06/06/2019 |
The above query produces this output. The issue is I need to evaluate both the date in b_step and c_step however as you can they are on different rows because there was no C step on the other dates. Since there is not way to evaluate rows unless they are on the same row, is there a way to program the code to get the c_step date on the same row as the b_step. So 6/6/2019 in c_step would move to the top row. If not then is there a way to evaluate b_step and c_step in its current condition (ie if b_step and c_step >0 then F= 'Yes' ;else F='No';
Check next code:
data test;
input ln $ step $ date1 date9.;
format date1 date9.;
datalines;
0122 B 01jun2019
0122 B 07jun2019
0122 C 06jun2019
;
run;
data test2;
set test;
by ln ;
retain _all_;
if step='B' then b_step=date1;
if step='C' then c_step=date1;
if last.ln then do; if b_step and c_step >0 then F= 'Yes' ;else F='No'; output; end;
keep ln b_step c_step F;
format b_step c_step mmddyy10.;
run;
Your solution puts the c_step on the top row however this is the actual output desired
Original output | ||||
ln | step | date1 | b_step | c_step |
0122 | B | 01Jun2019 | 06/01/2019 | |
0122 | B | 07Jun2019 | 06/07/2019 | |
0122 | C | 06Jun2019 | 06/06/2019 | |
Desired Output | ||||
ln | step | date1 | b_step | c_step |
0122 | B | 01Jun2019 | 06/01/2019 | 06/06/2019 |
0122 | B | 07Jun2019 | 06/07/2019 | |
0122 | C | 06Jun2019 |
Idea is to show c_step on same row as the B_step so I can evaluate borizontally. By nature the steps will have different rows due to different step names "B" and "C"
I don't fully understand your description of what you want. Please post what the resulting data would actually look like.
Original output | ||||
ln | step | date1 | b_step | c_step |
0122 | B | 01Jun2019 | 06/01/2019 | |
0122 | B | 07Jun2019 | 06/07/2019 | |
0122 | C | 06Jun2019 | 06/06/2019 | |
Desired Output | ||||
ln | step | date1 | b_step | c_step |
0122 | B | 01Jun2019 | 06/01/2019 | 06/06/2019 |
0122 | B | 07Jun2019 | 06/07/2019 | |
0122 | C | 06Jun2019 |
Make a minor modification to the way you make dataset TEST. Then it is straightforward:
data test;
input ln $ step $ date1 date9.;
if step='B' then b_step=date1; else
if step='C' then c_step=date1;
format date1 b_step c_step date9.;
datalines;
0122 B 01jun2019
0122 B 07jun2019
0122 C 06jun2019
run;
data test2;
merge test (where=(step='C') in=inc)
test;
by ln ;
output;
call missing(of _all_);
run;
The MERGE statement merges the single step='C' record with all three records. Ordinarily that means the variable C_STEP would be present in all 3 resulting records (because it's a one-to-many merge).
But because it uses an explicit OUTPUT statement followed by a CALL MISSING(of _ALL_), the C_STEP inserted into record 1 was set to missing and was not restored until it was re-read in the 3rd record.
Much like in life, the SAS data step is good at "knowing the past" but not as good at "knowing the future" (it's just an analogy)
Thus, SAS has the LAG function, but not a LEAD function.
I generally avoid LAG when a RETAIN works just as well, but that's just my coding style.
However, it's easy to use PROC SQL's Cartesian product plus a surrogate key to implement both LEAD and LAG:
data test;
input ln $ step $ date1 date9.;
format date1 date9.;
datalines;
0122 B 01jun2019
0122 B 07jun2019
0122 C 06jun2019
;
run;
data have/view=have;
sk+1;
set test;
run;
proc sql;
create table want as
select a.ln
,a.step
,a.date1
,b.date1 as date1_lag
,c.date1 as date1_lead
from have a
full join have b
on a.ln=b.ln and a.sk=b.sk+1
full join have c
on a.ln=c.ln and a.sk+1=c.sk
where a.ln is not missing
;
quit;
There are other approaches:
but SQL + Cartesian + SK is my go-to approach as long as data volumes don't cause performance issues due to the Cartesian product.
HTH...
I would argue that there really isn't a true LAG function in SAS either. This misleadingly-named function should really be called something like UPDFIFOQ or UFQ (for "update fifo queue").
I bet using such queue-oriented nomenclature would not only reduce the frequent lament at the absence of "LEAD" function, but would also enhance understanding when, how, and whether to use the function in an if statement. And using the notion of updating queues would be a better fit for your observation that the "data step is good at knowing the past".
And of course, the data step can actually look ahead about as easily as it can look back:
data past_present_future;
merge have
have (firstobs=2 keep=x rename=(x=next_x));
if _n_>1 then set have (keep=x rename=(x=past_x));
run;
I would argue that there really isn't a true LAG function in SAS either.
I dunno...it's listed in the doc as a function 😉 IIRC with a number of caveats - it's been a while since I read the doc on LAG and am too lazy to look it up right now.
This misleadingly-named function should really be called something like UPDFIFOQ or UFQ (for "update fifo queue").
SASWare Ballot? 😉
I just never really use LAG (LAG1 that is):
data class;
length lag_name $10;
retain lag_name;
lag_name=name;
set sashelp.class;
run;
Or perhaps:
proc sort data=sashelp.class out=class (keep=name age);
by age name;
run;
data want;
length lag_name $10;
retain lag_name;
lag_name=name;
set class;
by age name;
if first.age then call missing(lag_name);
run;
But it gets more cumbersome the more grouping columns you have.
Things get trickier using this approach if I need LAG2, LAG3, etc, but I rarely have a need for that.
And of course, the data step can actually look ahead about as easily as it can look back:
data past_present_future; merge have have (firstobs=2 keep=x rename=(x=next_x)); if _n_>1 then set have (keep=x rename=(x=past_x)); run;
Define "easily" lol. I'll define it as "intuitively obvious to someone with less than 3 years SAS experience" (or easily found as an example in the doc).
But you've shown a cool technique mixing merge and double set statement. I've done something similar in the past using double (or triple) set statements, and the premature eof (firstobs=2) was problematic. I can't remember what approach I used? point=? do until eof? Anyway, now I have your approach, so thanks!
Standalone example:
data past_present_future;
format past_name name next_name;
if _n_>1 then set sashelp.class (keep=name rename=(name=past_name));
merge sashelp.class
sashelp.class (firstobs=2 keep=name rename=(name=next_name));
run;
However, how would you do the below using your approach if you needed to group (without lots of first. and last. processing + call missings)?
proc sort data=sashelp.class out=class (keep=name age);
by age name;
run;
data have/view=have;
sk+1;
set class;
run;
proc sql;
create table want as
select b.name as past_name
,a.name
,c.name as next_name
,a.age
from have a
full join have b
on a.age=b.age and a.sk=b.sk+1
full join have c
on a.age=c.age and a.sk+1=c.sk
where a.name is not missing
;
quit;
Or worse:
proc sort data=sashelp.cars out=cars (keep=make type model drivetrain);
by make type drivetrain;
run;
data have/view=have;
sk+1;
set cars;
run;
proc sql;
create table want as
select b.model as past_model
,a.model
,c.model as next_model
,a.make
,a.type
,a.drivetrain
from have a
full join have b
on a.make=b.make and a.type=b.type and a.drivetrain=b.drivetrain and a.sk=b.sk+1
full join have c
on a.make=b.make and a.type=b.type and a.drivetrain=b.drivetrain and a.sk+1=c.sk
where a.model is not missing
;
quit;
(I haven't done any in depth checking on this last example to see if the results make sense - they looked ok on a quick glance).
Edit: I have used your approach, or something similar, when performance was an issue. Say 50M records, and I needed LEAD within say two grouping columns. 50M * 50M * 50M Cartesian, even when filtered out with the join criteria, can be a bit of an "issue"! But it required a bit of first. and last. processing plus call missing statements (and a lot of testing) to get the desired results.
@ScottBass wrote:
However, how would you do the below using your approach if you needed to group (without lots of first. and last. processing + call missings)?
proc sort data=sashelp.class out=class (keep=name age); by age name; run; data have/view=have; sk+1; set class; run; proc sql; create table want as select b.name as past_name ,a.name ,c.name as next_name ,a.age from have a full join have b on a.age=b.age and a.sk=b.sk+1 full join have c on a.age=c.age and a.sk+1=c.sk where a.name is not missing ; quit;
Or worse:
proc sort data=sashelp.cars out=cars (keep=make type model drivetrain); by make type drivetrain; run; data have/view=have; sk+1; set cars; run; proc sql; create table want as select b.model as past_model ,a.model ,c.model as next_model ,a.make ,a.type ,a.drivetrain from have a full join have b on a.make=b.make and a.type=b.type and a.drivetrain=b.drivetrain and a.sk=b.sk+1 full join have c on a.make=b.make and a.type=b.type and a.drivetrain=b.drivetrain and a.sk+1=c.sk where a.model is not missing ; quit;
(I haven't done any in depth checking on this last example to see if the results make sense - they looked ok on a quick glance).
Edit: I have used your approach, or something similar, when performance was an issue. Say 50M records, and I needed LEAD within say two grouping columns. 50M * 50M * 50M Cartesian, even when filtered out with the join criteria, can be a bit of an "issue"! But it required a bit of first. and last. processing plus call missing statements (and a lot of testing) to get the desired results.
Yes, by-groups require the use of a BY statement. But that BY statement should be associated not with the MERGE or SET statements in my prior example, but to an additional SET that reads in only the BY vars, as in:
proc sort data=sashelp.cars out=cars (keep=make type model drivetrain);
by make type drivetrain;
run;
data past_present_future;
if _n_>1 then set cars (keep=model rename=(model=past_model));
merge cars
cars (firstobs=2 keep=model rename=(model=next_model));
set cars (keep=make type drivetrain);
by make type drivetrain;
if first.drivetrain then call missing(of past_:);
if last.drivetrain then call missing(of next_:);
run;
This is easily expanded to carry multiple PAST_ and NEXT_ variables, requiring only the addition of those variables (and associated RENAMEs) in the "IF ... THEN SET" statement and the 2nd argument of the MERGE statement. Just make sure to use PAST_xxx and NEXT_xxx for the renamed variables, so that the CALL MISSING can be left unchanged.
I'm not going to argue over "easily", but I do regard this code as relatively compact and straightforward. Its fundamental requirement is a level of comfort with understanding how the DATA step builds the program-data-vector. If only SAS Institute had written more meaningful documentation on SET and MERGE.
But my main point is that the DATA step need not be limited to "looking back".
regards,
Mark
Please use the proper window ("little running man" button) for posting code. The main window mangles code, so it needs all kinds of cosmetic operations before it runs and looks good.
See this code:
data test;
input ln $ step $ date1 date9.;
format date1 date9.;
datalines;
0122 B 01jun2019
0122 B 07jun2019
0122 C 06jun2019
;
data test2;
set test;
if step='B' then b_step=date1;
if step='C' then c_step=date1;
format b_step c_step mmddyy10.;
run;
data want;
if 0 then set test2;
call missing (of _all_);
merge
test2 (drop=c_step)
test2 (drop=step date1 b_step where=(c_step ne .));
;
by ln;
run;
proc print data=want noobs;
run;
Result:
ln step date1 b_step c_step 0122 B 01JUN2019 06/01/2019 06/06/2019 0122 B 07JUN2019 06/07/2019 . 0122 C 06JUN2019 . .
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.