@ScottBass wrote:
However, how would you do the below using your approach if you needed to group (without lots of first. and last. processing + call missings)?
proc sort data=sashelp.class out=class (keep=name age);
by age name;
run;
data have/view=have;
sk+1;
set class;
run;
proc sql;
create table want as
select b.name as past_name
,a.name
,c.name as next_name
,a.age
from have a
full join have b
on a.age=b.age and a.sk=b.sk+1
full join have c
on a.age=c.age and a.sk+1=c.sk
where a.name is not missing
;
quit;
Or worse:
proc sort data=sashelp.cars out=cars (keep=make type model drivetrain);
by make type drivetrain;
run;
data have/view=have;
sk+1;
set cars;
run;
proc sql;
create table want as
select b.model as past_model
,a.model
,c.model as next_model
,a.make
,a.type
,a.drivetrain
from have a
full join have b
on a.make=b.make and a.type=b.type and a.drivetrain=b.drivetrain and a.sk=b.sk+1
full join have c
on a.make=b.make and a.type=b.type and a.drivetrain=b.drivetrain and a.sk+1=c.sk
where a.model is not missing
;
quit;
(I haven't done any in depth checking on this last example to see if the results make sense - they looked ok on a quick glance).
Edit: I have used your approach, or something similar, when performance was an issue. Say 50M records, and I needed LEAD within say two grouping columns. 50M * 50M * 50M Cartesian, even when filtered out with the join criteria, can be a bit of an "issue"! But it required a bit of first. and last. processing plus call missing statements (and a lot of testing) to get the desired results.
@ScottBass
Yes, by-groups require the use of a BY statement. But that BY statement should be associated not with the MERGE or SET statements in my prior example, but to an additional SET that reads in only the BY vars, as in:
proc sort data=sashelp.cars out=cars (keep=make type model drivetrain);
by make type drivetrain;
run;
data past_present_future;
if _n_>1 then set cars (keep=model rename=(model=past_model));
merge cars
cars (firstobs=2 keep=model rename=(model=next_model));
set cars (keep=make type drivetrain);
by make type drivetrain;
if first.drivetrain then call missing(of past_:);
if last.drivetrain then call missing(of next_:);
run;
This is easily expanded to carry multiple PAST_ and NEXT_ variables, requiring only the addition of those variables (and associated RENAMEs) in the "IF ... THEN SET" statement and the 2nd argument of the MERGE statement. Just make sure to use PAST_xxx and NEXT_xxx for the renamed variables, so that the CALL MISSING can be left unchanged.
I'm not going to argue over "easily", but I do regard this code as relatively compact and straightforward. Its fundamental requirement is a level of comfort with understanding how the DATA step builds the program-data-vector. If only SAS Institute had written more meaningful documentation on SET and MERGE.
But my main point is that the DATA step need not be limited to "looking back".
regards,
Mark
... View more