BookmarkSubscribeRSS Feed

Look-Ahead and Look-Back

Started ‎07-05-2018 by
Modified ‎07-06-2018 by
Views 8,884

A common requirement in programming is to be able to look ahead at the next record or back at the previous record in order to make a calculation or comparison. In the SAS DATA step, look-ahead is trickier than look-back. In SQL, there is symmetry.

For example, start with

 

data have;
input ID $ Index Measure;
cards;
A 1 11
A 2 12
A 3 13
B 1 21
B 2 22
;

How can each observation be extended to include the next and previous values of MEASURE, with missing values in the appropriate places at the beginning, end, and ID boundaries? In other words, how can one derive this data set:

 

 

                            Next_      Prev_
 ID    Index    Measure    Measure    Measure    

 A       1         11         12          .
 A       2         12         13         11
 A       3         13          .         12
 B       1         21         22          .
 B       2         22          .         21

Here's a DATA step solution:

 

data look_both_ways;
set have;
by ID;
set have ( firstobs = 2 keep = Measure rename = (Measure = Next_Measure) )
    have (      obs = 1 drop = _all_                                     );
Prev_Measure = ifn( first.ID, (.), lag(Measure) );
Next_Measure = ifn(  last.ID, (.), Next_Measure );
run;

Notice that the look-back values are supplied by the LAG function, while the look-ahead requires an additional SET statement coordinated carefully with the first one.

Here's a PROC SQL solution in the form of a chain of LEFT JOINs:

proc sql;
create table look_both_ways as
 select   have.*
        , next.Measure as Next_Measure
        , prev.Measure as Prev_Measure           
  from have
       left join
       have as prev
         on have.ID    = prev.ID
            and 
            have.Index = prev.Index + 1
       left join
       have as next
         on have.ID    = next.ID
            and
            have.Index = next.Index - 1
;
quit;

Notice that this SQL solution depends on the INDEX values with their equal increments, not on the order of the rows in the given table.

Here's another SQL solution, one which does not need the INDEX column but rather depends on the strict monotonicity of the column (MEASURE in this case) for which we need the look-ahead and look-back. Instead of a chain of joins, there is a structure of nested queries.

create table look_both_ways as
 select lookback.*, next.Measure as Next_Measure         
  from  ( select have.*, prev.Measure as Prev_Measure         
           from have left join have as prev
             on have.ID = prev.ID and prev.Measure < have.Measure
           group by have.ID, have.Measure
           having prev.Measure = max(prev.Measure)
        ) as lookback 
   left join have as next
     on lookback.ID = next.ID and next.Measure > lookback.Measure
   group by lookback.ID, lookback.Measure
   having next.Measure = min(next.Measure)
;

 This may be a bit easier to follow if broken into two statements, with the inline view replaced by a named view.

create view lookback as
 select have.*, prev.Measure as Prev_Measure         
  from have left join have as prev
   on have.ID = prev.ID and prev.Measure < have.Measure
  group by have.ID, have.Measure
  having prev.Measure = max(prev.Measure)
  ;
create table look_both_ways as
 select lookback.*, next.Measure as Next_Measure         
  from lookback left join have as next
   on lookback.ID = next.ID and next.Measure > lookback.Measure
  group by lookback.ID, lookback.Measure
  having next.Measure = min(next.Measure)
  ;

 All of the solutions produce the specified results, but that's only because of the way the example is contrived. The DATA step "reads" the order of the observations in the source. The first SQL view reads the evenly separated INDEX values. The second SQL solution reads the sequence of the MEASURE values. In general the three methods will produce different results.

 

Originally posted by Howard Schreier on sasCommunity.org.

Comments

For the data step technique, instead of:

 

data look_both_ways;
  set have;
  by ID;
  set have ( firstobs = 2 keep = Measure rename = (Measure = Next_Measure) )
      have (      obs = 1 drop = _all_                                     );
  ... other code here ...

consider

 

data look_both_ways;
  set have (keep=id);
  by ID;
  merge have 
        have ( firstobs = 2 keep = Measure rename = (Measure = Next_Measure) );
  ... other code ...

 

I suggest this alternative because you don't need to make the SET HAVE (firstobs=2) statement to "wrap around" the end of the dataset to the first obs, a technique that is used to prevent a premature stop to the data step.  Using merge means there's less additional coding, and it would be easier to accommodate "firstobs=3" or 4, etc, since MERGE statement  does not stop until ALL the merged data sets are exhausted.

Version history
Last update:
‎07-06-2018 04:04 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags