BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Krueger
Pyrite | Level 9

Right now the best thing I can find is reversing the order of my dataset and using LAG. Is there an alternative to this?

1 ACCEPTED SOLUTION
6 REPLIES 6
hashman
Ammonite | Level 13

@Krueger;

There're many. IMO, the simplest (which I don't find in the link by @Reeza, nor in the old link within the link) is reading the Kth record downstream from the current using SET POINT=K.

 

Here's an example for the most common scenario of creating K=1 lead values in every BY group:

data have ;                                                                                                                                                                                                                                                     
  input id var ;                                                                                                                                                                                                                                                
  cards ;                                                                                                                                                                                                                                                       
1  1                                                                                                                                                                                                                                                            
2  2                                                                                                                                                                                                                                                            
2  3                                                                                                                                                                                                                                                            
3  4                                                                                                                                                                                                                                                            
3  5                                                                                                                                                                                                                                                            
3  6                                                                                                                                                                                                                                                            
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data lead ;                                                                                                                                                                                                                                                     
  set have ;                                                                                                                                                                                                                                           
  by id ;                                                                                                                                                                                                                                                       
  _n_ + 1 ;                                                                                                                                                                                                                                                     
  if not last.id then set have (keep=var rename=var=lead_var) point = _n_ ;                                                                                                                                                                                              
  else call missing (lead_var) ;                                                                                                                                                                                                                                
run ;                                                                  

Result:

id    var    lead_var                                                                                                                                                                                                                                           
---------------------                                                                                                                                                                                                                                           
 1     1         .                                                                                                                                                                                                                                              
 2     2         3                                                                                                                                                                                                                                              
 2     3         .                                                                                                                                                                                                                                              
 3     4         5                                                                                                                                                                                                                                              
 3     5         6                                                                                                                                                                                                                                              
 3     6         .

If the leads are to be created throughout the file with no regard to BY grouping, the idea is the same, and the code is virtually identical:

data lead ;                                                                                                                                                                                                                                                     
  set have end = z ;                                                                                                                                                                                                                                            
  _n_ + 1 ;                                                                                                                                                                                                                                                     
  if not z then set have (keep=var rename=var=lead_var) point = _n_ ;                                                                                                                                                                                           
  else call missing (lead_var) ;                                                                                                                                                                                                                                
run ;                          

Kind regards

Paul D.

 

 

ChrisNZ
Tourmaline | Level 20

To me the simplest (and fastest) is simply to use :

data T2;
  merge T1
        T1 (firstobs=2 keep=VAR rename=(VAR=LEAD_VAR) );
run;
hashman
Ammonite | Level 13

@ChrisNZ:

It's simplest and fastest in this simple case. In the more common BY case it gets less simple since the records in each BY group have to be enumerated first, i.e., for example:

data have ;                                                                                                                                                                                                                                                     
  input id var ;                                                                                                                                                                                                                                                
  cards ;                                                                                                                                                                                                                                                       
1  1                                                                                                                                                                                                                                                            
2  2                                                                                                                                                                                                                                                            
2  3                                                                                                                                                                                                                                                            
3  4                                                                                                                                                                                                                                                            
3  5                                                                                                                                                                                                                                                            
3  6                                                                                                                                                                                                                                                            
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data v (keep=id var seq) / view = v ;                                                                                                                                                                                                                                             
  set have ;                                                                                                                                                                                                                                                    
  by id ;                                                                                                                                                                                                                                                       
  if first.id then seq = 1 ;                                                                                                                                                                                                                                    
  else             seq + 1 ;                                                                                                                                                                                                                                    
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data lead (drop = seq) ;                                                                                                                                                                                                                                        
  merge have v (rename=var=lead_var where=(seq > 1)) ;                                                                                                                                                                                                          
  by id ;                                                                                                                                                                                                                                                       
  if last.id then call missing (lead_var) ;                                                                                                                                                                                                                     
run ;                       

 As far as the fastest goes, with this method the file ends up being read twice even when generating a lead is needed only for certain records upon a specific condition. E.g., picture a situation where one needs to replace VAR with its lead only when VAR is missing and such records constitute but a tiny fraction of the whole file. In such a case, it would be faster to grab the needed leads via POINT= only for those specific records, would it not?

 

Kind regards

Paul D.   

ChrisNZ
Tourmaline | Level 20

@hashman For a BY group, I'd probably do something like this:

data LEAD2;
  set HAVE;
  by ID;
  if ^LASTOBS then 
    set HAVE(firstobs=2 keep=VAR rename=(VAR=LEAD_VAR)) end=LASTOBS;
  if last.ID then call missing (LEAD_VAR) ;       
run;

If a small fraction of the values has to be retrieved, the losses due to direct POINT= access would be small, and as you point out this method would probably be preferable.

In any case, since we are reading the next observation, losses of either method are probably limited since we are always reading from the cache. One of the reads will be made from RAM, but . I random access logic has some overheads that sequential read does not.

data HAVE ; do VAR=1 to 1e7; output; end;

data WANT;
  merge HAVE
        HAVE (firstobs=2 keep=VAR rename=(VAR=LEAD_VAR) );
run;
/*    real time           2.48 seconds
      user cpu time       1.95 seconds  */

data lead ;      
  set have end = z ;                                                                                                                                                                                                                                            
  _n_ + 1 ;                                                                                                                                                                                                                                                     
  if not z then set have (keep=var rename=var=lead_var) point = _n_ ;                                                                                                                                                                                           
  else call missing (lead_var) ;                                                                                                                                                                                                                                
run ;      
/*    real time           3.54 seconds
      user cpu time       2.71 seconds  */

 

 

 

 

hashman
Ammonite | Level 13

@ChrisNZ: Amen to all that. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 871 views
  • 5 likes
  • 4 in conversation