BookmarkSubscribeRSS Feed
craigsrsen
Calcite | Level 5

I have a dataset that I am processing in BY-groups and am using multiple LAG functions, e.g. a LAG1 variable, a LAG2 variable, etc.

 

When I reach a new BY group, I want to clear the entire LAG queue from memory and start over. If I use FIRST. I can clear lagged values from the first record in a BY group, but on the second record of the BY group I am still pulling in lagged values from the previous BY group.

 

I could probably devise a work-around but I am wondering if there is a simple command to clear the entire LAG queue.

 

Thanks

6 REPLIES 6
ballardw
Super User

@craigsrsen wrote:

I have a dataset that I am processing in BY-groups and am using multiple LAG functions, e.g. a LAG1 variable, a LAG2 variable, etc.

 

When I reach a new BY group, I want to clear the entire LAG queue from memory and start over. If I use FIRST. I can clear lagged values from the first record in a BY group, but on the second record of the BY group I am still pulling in lagged values from the previous BY group.

 

I could probably devise a work-around but I am wondering if there is a simple command to clear the entire LAG queue.

 

Thanks


No simple way to clear the lag queue entirely.

If you show the code you are currently attempting we can make suggestions that may work in this case.

Likely it is going to involve a counter within the by group and conditional assignment to a different variable than a simple lagn()

hashman
Ammonite | Level 13

@craigsrsen:

 

It cannot be done after the BY group, and here's why. Suppose you have data as below and attempt to clean the queues after each BY group by repeatedly dequeueing in a loop up to the length of the longest lag:

data test ;            
  input id v ;         
  cards ;              
2 1                    
2 2                    
3 1                    
3 2                    
3 3                    
;                      
run ;                  
                       
data _null_ ;          
  do until (last.id) ; 
    set test ;         
    by id ;            
    x1 = lag1 (v) ;    
    x2 = lag2 (v) ;    
    put _n_= x1= x2= ; 
  end ;                
  do _n_ = 1 to 2 ;    
    x = lag1 (.) ;     
    x = lag2 (.) ;     
  end ;                
run ;                  

What you'll see in the log will be (note that _N_ below is the BY group number): 

_N_=1 x1=. x2=.
_N_=1 x1=1 x2=.
_N_=2 x1=2 x2=1
_N_=2 x1=1 x2=2
_N_=2 x1=2 x2=1

 Obviously, nothing has been cleaned up. The reason is that when the compiler sees LAG1 and LAG2 for the second time, it allocates totally new queues having nothing whatsoever to do with the queues referenced in the BY group DO loop, and so the "clean-up" DO loop executes against the second set of the LAGs. The only two ways I see to work around it are:

 

1. Organize your queues using arrays or hash tables (it's actually pretty easy).

2. To the end of each BY group, add as many dummy records as the length of the longest lag. This way, the compiler will see the lags only once but the dummy records will dequeue them before the next BY group appears.

 

WRT #2, for example:

%let maxlag = 2 ; * longest lag ;        
                                         
data dummy / view = dummy ;              
  set test ;                             
  by id ;                                
  output ;                               
  v = . ;                                
  if last.id then do _n_ = 1 to &maxlag ;
    output ;                             
  end ;                                  
run ;                                    
                                         
data _null_ ;                            
  do until (last.id) ;                   
    set dummy ;                          
    by id ;                              
    x1 = lag1 (v) ;                      
    x2 = lag2 (v) ;                      
    put _n_= x1= x2= ;                   
  end ;                                  
run ;                                    

In this case, you'll see in the log:

_N_=1 x1=. x2=.
_N_=1 x1=1 x2=.
_N_=1 x1=2 x2=1
_N_=1 x1=. x2=2
_N_=2 x1=. x2=.
_N_=2 x1=1 x2=.
_N_=2 x1=2 x2=1
_N_=2 x1=3 x2=2
_N_=2 x1=. x2=3

as you'd expect since the lags have been dequeued by calling them for each dummy record. In your program, you'll have to include some conditional statement to separate your processing of the real records from the dummy ones - just make sure it doesn't include any reference to LAG1 or LAG2 ;).    

 

Paul D.

 

ChrisNZ
Tourmaline | Level 20

This is probably how I would implement a proper-queue logic:

proc sort data=SASHELP.CLASS out=CLASS; 
  by SEX NAME;
run;
data CLASS2;
 set CLASS;
  by SEX;
  N1=lag1(NAME) ;
  N2=lag2(NAME) ;
  if SEX ne lag1(SEX) then N1=' ';
  if SEX ne lag2(SEX) then N2=' ';
run;

It's easy to read and to debug.

 

 

craigsrsen
Calcite | Level 5

Thanks Chris. This is basically the solution I ended up going with. It was pretty straightforward.

Tom
Super User Tom
Super User

You can just use RETAIN instead of LAG().  Then it is easy to just use CALL MISSING();

data want ;
   lag5=lag4;
   lag4=lag3;
   lag3=lag2;
   lag2=lag1;
   lag1=x;
   retain lag1-lag5 ;
   set have ;
   by id;
   if first.id then call missing(of lag1-lag5);

   .... do stuff  ...

run;
  
hashman
Ammonite | Level 13

@Tom:

 

Exactly. You're essentially organizing a queue as an expanded (unrolled) array. As such, it can be coded as an unexpanded array as well; and:

 

- if it is non-temporary, CALL MISSING is not needed

- and if the DoW loop is used, RETAIN isn't needed, either

 

E.g.:

data have ;                                               
  do ID = "A", "B" ;                                      
    do x = 1 to 9 ;                                       
      output ;                                            
    end ;                                                 
  end ;                                                   
run ;                                                     
                                                          
data want (drop = _:) ;                                   
  do until (last.id) ;                                    
    set have ;                                            
    by ID ;                                               
    array xq [*] x _q1-_q5 ;                       
    do _i = dim (xq) to 2 by -1 ;                      
      xq [_i] = xq [_i - 1] ;                       
    end ;                                                 
    put ID= x= " q: " _q1-_q5 ;
    /* do stuff */                                        
  end ;                                                   
run ;                                                     

And the log will show:

ID=A x=1 q: 1 . . . .
ID=A x=2 q: 2 1 . . .
ID=A x=3 q: 3 2 1 . .
ID=A x=4 q: 4 3 2 1 .
ID=A x=5 q: 5 4 3 2 1
ID=A x=6 q: 6 5 4 3 2
ID=A x=7 q: 7 6 5 4 3
ID=A x=8 q: 8 7 6 5 4
ID=A x=9 q: 9 8 7 6 5
ID=B x=1 q: 1 . . . .
ID=B x=2 q: 2 1 . . .
ID=B x=3 q: 3 2 1 . .
ID=B x=4 q: 4 3 2 1 .
ID=B x=5 q: 5 4 3 2 1
ID=B x=6 q: 6 5 4 3 2
ID=B x=7 q: 7 6 5 4 3
ID=B x=8 q: 8 7 6 5 4
ID=B x=9 q: 9 8 7 6 5

Quod erat faciendum.

 

One little issue with this way of organizing a queue is the need to shift all the q-values one position backward for each record read from the file. However, that can be easily recoded by either (a) allocating a "big enough" array and merely adjusting index pointers instead of moving the values or (b) using a hash table with its advantage of dynamic sizing. (In fact, there's a section on organizing a hash queue in Chapter 10 of the book "Data Management Solutions Using SAS Hash Table Operations" just published by SAS Press and penned by Don Henderson and yours truly.)   

 

Best

Paul D.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1010 views
  • 7 likes
  • 5 in conversation