SAS Programming

Krueger · Posted 11-08-2019 03:19 PM

Right now the best thing I can find is reversing the order of my dataset and using LAG. Is there an alternative to this?

Reeza · Posted 11-08-2019 03:20 PM

If you have SAS ETS then PROC EXPAND can do that.
If not, here's some approaches.
https://communities.sas.com/t5/SAS-Communities-Library/How-to-simulate-the-LEAD-function-opposite-of...

View solution in original post

Reeza · Posted 11-08-2019 03:20 PM

If you have SAS ETS then PROC EXPAND can do that.
If not, here's some approaches.
https://communities.sas.com/t5/SAS-Communities-Library/How-to-simulate-the-LEAD-function-opposite-of...

hashman · Posted 11-09-2019 12:04 AM

@Krueger;

There're many. IMO, the simplest (which I don't find in the link by @Reeza, nor in the old link within the link) is reading the Kth record downstream from the current using SET POINT=K.

Here's an example for the most common scenario of creating K=1 lead values in every BY group:

data have ;                                                                                                                                                                                                                                                     
  input id var ;                                                                                                                                                                                                                                                
  cards ;                                                                                                                                                                                                                                                       
1  1                                                                                                                                                                                                                                                            
2  2                                                                                                                                                                                                                                                            
2  3                                                                                                                                                                                                                                                            
3  4                                                                                                                                                                                                                                                            
3  5                                                                                                                                                                                                                                                            
3  6                                                                                                                                                                                                                                                            
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data lead ;                                                                                                                                                                                                                                                     
  set have ;                                                                                                                                                                                                                                           
  by id ;                                                                                                                                                                                                                                                       
  _n_ + 1 ;                                                                                                                                                                                                                                                     
  if not last.id then set have (keep=var rename=var=lead_var) point = _n_ ;                                                                                                                                                                                              
  else call missing (lead_var) ;                                                                                                                                                                                                                                
run ;

Result:

id    var    lead_var                                                                                                                                                                                                                                           
---------------------                                                                                                                                                                                                                                           
 1     1         .                                                                                                                                                                                                                                              
 2     2         3                                                                                                                                                                                                                                              
 2     3         .                                                                                                                                                                                                                                              
 3     4         5                                                                                                                                                                                                                                              
 3     5         6                                                                                                                                                                                                                                              
 3     6         .

If the leads are to be created throughout the file with no regard to BY grouping, the idea is the same, and the code is virtually identical:

data lead ;                                                                                                                                                                                                                                                     
  set have end = z ;                                                                                                                                                                                                                                            
  _n_ + 1 ;                                                                                                                                                                                                                                                     
  if not z then set have (keep=var rename=var=lead_var) point = _n_ ;                                                                                                                                                                                           
  else call missing (lead_var) ;                                                                                                                                                                                                                                
run ;

Kind regards

Paul D.

ChrisNZ · Posted 11-09-2019 06:04 AM

To me the simplest (and fastest) is simply to use :

data T2;
  merge T1
        T1 (firstobs=2 keep=VAR rename=(VAR=LEAD_VAR) );
run;

High-Performance SAS Coding - Third Edition

hashman · Posted 11-09-2019 01:34 PM

@ChrisNZ:

It's simplest and fastest in this simple case. In the more common BY case it gets less simple since the records in each BY group have to be enumerated first, i.e., for example:

data have ;                                                                                                                                                                                                                                                     
  input id var ;                                                                                                                                                                                                                                                
  cards ;                                                                                                                                                                                                                                                       
1  1                                                                                                                                                                                                                                                            
2  2                                                                                                                                                                                                                                                            
2  3                                                                                                                                                                                                                                                            
3  4                                                                                                                                                                                                                                                            
3  5                                                                                                                                                                                                                                                            
3  6                                                                                                                                                                                                                                                            
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data v (keep=id var seq) / view = v ;                                                                                                                                                                                                                                             
  set have ;                                                                                                                                                                                                                                                    
  by id ;                                                                                                                                                                                                                                                       
  if first.id then seq = 1 ;                                                                                                                                                                                                                                    
  else             seq + 1 ;                                                                                                                                                                                                                                    
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data lead (drop = seq) ;                                                                                                                                                                                                                                        
  merge have v (rename=var=lead_var where=(seq > 1)) ;                                                                                                                                                                                                          
  by id ;                                                                                                                                                                                                                                                       
  if last.id then call missing (lead_var) ;                                                                                                                                                                                                                     
run ;

As far as the fastest goes, with this method the file ends up being read twice even when generating a lead is needed only for certain records upon a specific condition. E.g., picture a situation where one needs to replace VAR with its lead only when VAR is missing and such records constitute but a tiny fraction of the whole file. In such a case, it would be faster to grab the needed leads via POINT= only for those specific records, would it not?

Kind regards

Paul D.

ChrisNZ · Posted 11-10-2019 04:13 PM

@hashman For a BY group, I'd probably do something like this:

data LEAD2;
  set HAVE;
  by ID;
  if ^LASTOBS then 
    set HAVE(firstobs=2 keep=VAR rename=(VAR=LEAD_VAR)) end=LASTOBS;
  if last.ID then call missing (LEAD_VAR) ;       
run;

If a small fraction of the values has to be retrieved, the losses due to direct POINT= access would be small, and as you point out this method would probably be preferable.

In any case, since we are reading the next observation, losses of either method are probably limited since we are always reading from the cache. One of the reads will be made from RAM, but . I random access logic has some overheads that sequential read does not.

data HAVE ; do VAR=1 to 1e7; output; end;

data WANT;
  merge HAVE
        HAVE (firstobs=2 keep=VAR rename=(VAR=LEAD_VAR) );
run;
/*    real time           2.48 seconds
      user cpu time       1.95 seconds  */

data lead ;      
  set have end = z ;                                                                                                                                                                                                                                            
  _n_ + 1 ;                                                                                                                                                                                                                                                     
  if not z then set have (keep=var rename=var=lead_var) point = _n_ ;                                                                                                                                                                                           
  else call missing (lead_var) ;                                                                                                                                                                                                                                
run ;      
/*    real time           3.54 seconds
      user cpu time       2.71 seconds  */

High-Performance SAS Coding - Third Edition

hashman · Posted 11-10-2019 05:07 PM

@ChrisNZ: Amen to all that.

SAS Programming

Is there a LEAD() function like in SQL?

Re: Is there a LEAD() function like in SQL?

Re: Is there a LEAD() function like in SQL?

Re: Is there a LEAD() function like in SQL?

Re: Is there a LEAD() function like in SQL?

Re: Is there a LEAD() function like in SQL?

Re: Is there a LEAD() function like in SQL?

Re: Is there a LEAD() function like in SQL?

Fun with PROC SQL Summary Functions

Loss Function

Remove leading zeros function?

CATS Function Deleting Leading Zeros

[SQL] 집합 연산

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...