A common requirement in programming is to be able to look ahead at the next record or back at the previous record in order to make a calculation or comparison. In the SAS DATA step, look-ahead is trickier than look-back. In SQL, there is symmetry.
For example, start with
data have;
input ID $ Index Measure;
cards;
A 1 11
A 2 12
A 3 13
B 1 21
B 2 22
;
How can each observation be extended to include the next and previous values of MEASURE, with missing values in the appropriate places at the beginning, end, and ID boundaries? In other words, how can one derive this data set:
Next_ Prev_
ID Index Measure Measure Measure
A 1 11 12 .
A 2 12 13 11
A 3 13 . 12
B 1 21 22 .
B 2 22 . 21
Here's a DATA step solution:
data look_both_ways;
set have;
by ID;
set have ( firstobs = 2 keep = Measure rename = (Measure = Next_Measure) )
have ( obs = 1 drop = _all_ );
Prev_Measure = ifn( first.ID, (.), lag(Measure) );
Next_Measure = ifn( last.ID, (.), Next_Measure );
run;
Notice that the look-back values are supplied by the LAG function, while the look-ahead requires an additional SET statement coordinated carefully with the first one.
Here's a PROC SQL solution in the form of a chain of LEFT JOINs:
proc sql; create table look_both_ways as select have.* , next.Measure as Next_Measure , prev.Measure as Prev_Measure from have left join have as prev on have.ID = prev.ID and have.Index = prev.Index + 1 left join have as next on have.ID = next.ID and have.Index = next.Index - 1 ; quit;
Notice that this SQL solution depends on the INDEX values with their equal increments, not on the order of the rows in the given table.
Here's another SQL solution, one which does not need the INDEX column but rather depends on the strict monotonicity of the column (MEASURE in this case) for which we need the look-ahead and look-back. Instead of a chain of joins, there is a structure of nested queries.
create table look_both_ways as select lookback.*, next.Measure as Next_Measure from ( select have.*, prev.Measure as Prev_Measure from have left join have as prev on have.ID = prev.ID and prev.Measure < have.Measure group by have.ID, have.Measure having prev.Measure = max(prev.Measure) ) as lookback left join have as next on lookback.ID = next.ID and next.Measure > lookback.Measure group by lookback.ID, lookback.Measure having next.Measure = min(next.Measure) ;
This may be a bit easier to follow if broken into two statements, with the inline view replaced by a named view.
create view lookback as select have.*, prev.Measure as Prev_Measure from have left join have as prev on have.ID = prev.ID and prev.Measure < have.Measure group by have.ID, have.Measure having prev.Measure = max(prev.Measure) ;
create table look_both_ways as select lookback.*, next.Measure as Next_Measure from lookback left join have as next on lookback.ID = next.ID and next.Measure > lookback.Measure group by lookback.ID, lookback.Measure having next.Measure = min(next.Measure) ;
All of the solutions produce the specified results, but that's only because of the way the example is contrived. The DATA step "reads" the order of the observations in the source. The first SQL view reads the evenly separated INDEX values. The second SQL solution reads the sequence of the MEASURE values. In general the three methods will produce different results.
Originally posted by Howard Schreier on sasCommunity.org.
For the data step technique, instead of:
data look_both_ways;
set have;
by ID;
set have ( firstobs = 2 keep = Measure rename = (Measure = Next_Measure) )
have ( obs = 1 drop = _all_ );
... other code here ...
consider
data look_both_ways;
set have (keep=id);
by ID;
merge have
have ( firstobs = 2 keep = Measure rename = (Measure = Next_Measure) );
... other code ...
I suggest this alternative because you don't need to make the SET HAVE (firstobs=2) statement to "wrap around" the end of the dataset to the first obs, a technique that is used to prevent a premature stop to the data step. Using merge means there's less additional coding, and it would be easier to accommodate "firstobs=3" or 4, etc, since MERGE statement does not stop until ALL the merged data sets are exhausted.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.