BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hashman
Ammonite | Level 13

@Chad_OSU:

Ahhh ... that's a different story: A waaaay better presentation of the task and expected results - but also quite different an arrangement.

The simple approach I've posted before won't cut here for two reasons:

  1. you need to determine the size of each successive collection of records with the same (group, mv) key-value before deciding whether to mark its records as counted or missing   
  2. hence, it cannot be done in a purely single pass through the data  

Below, is one way of cracking this - pretty general if the limits and the driver data are stored in separate files, as you showed - which, by the way, is a sensible way to structure data processing. Note that I've added a bunch of successive 5's in HAVE, as otherwise we cannot have a proper test case accounting for all eventualities: 

data limits ;                                                
  input group $ limit ;                                      
  cards ;                                                    
x 3                                                          
y 5                                                          
run ;                                                        
                                                             
data have ;                                                  
  input group $ MW ;                                         
  cards ;                                                    
x 0                                                          
x 0                                                          
x 5                                                          
x 5                                                          
x 0                                                          
x 0                                                          
x 0                                                          
x 0                                                          
x 0                                                          
x 0                                                          
x 0                                                          
y 0                                                          
y 0                                                          
y 5                                                          
y 0                                                          
y 0                                                          
y 0                                                          
y 0                                                          
y 0                                                          
y 0                                                          
y 0                                                          
y 5                                                          
y 5                                                          
y 5                                                          
y 5                                                          
y 5                                                          
y 5                                                          
y 5                                                          
run ;                                                        
                                                             
data want (drop = _:) ;                                      
  if _n_ = 1 then do ;                                       
    if 0 then set limits (keep = limit) ;                    
    dcl hash h (dataset:"limits") ;                          
    h.definekey  ("group") ;                                 
    h.definedata ("limit") ;                                 
    h.definedone () ;                                        
  end ;                                                      
  do _q = 1 by 1 until (last.mw) ;                           
    set have ;                                               
    by group mw notsorted ;                                  
  end ;                                                      
  h.find() ;                                                 
  do _n_ = 1 to _q ;                                         
    set have ;                                               
    if mw or (_q < limit) or (_n_ > limit) then counter = . ;
    else counter = _n_ ;                                     
    output ;                                                 
  end ;                                                      
run ;                                                        

Above, a hash table (H) is used to store the correspondence between GROUP and LIMIT in memory, so that LIMIT could be easily lookup up by GROUP downstream. The first DoW-loop determines the size of each (group,mw) BY group. The second reads the same BY group and selects the proper value for COUNTER based on the specified conditions.

 

Alternatively, the same can be done by using MERGE to pair the records from the driver with the appropriate LIMIT values (instead of using the hash table). However, it involves two steps since NOTSORTED cannot be coded with MERGE (or UPDATE) statements.

data havelimits ;                                            
  merge have limits ;                                        
  by group ;                                                 
run ;                                                        
                                                             
data want (drop = _:) ;                                      
  do _q = 1 by 1 until (last.mw) ;                           
    set havelimits ;                                         
    by group mw notsorted ;                                  
  end ;                                                      
  do _n_ = 1 to _q ;                                         
    set havelimits ;                                         
    if mw or (_q < limit) or (_n_ > limit) then counter = . ;
    else counter = _n_ ;                                     
    output ;                                                 
  end ;                                                      
run ;                                                        

Kind regards

Paul D.

Chad_OSU
Obsidian | Level 7

Thank you for the response.  Using your second MERGE method, how could I get this to apply ONLY upon a change from MW>0 to MW=0.  If you change line 3 of HAVE to x=0, the counter starts when is should not:

 

Capture.PNG

 

I would expect the first 3 lines to have no count.

 

Thanks for your help!

hashman
Ammonite | Level 13

@Chad_OSU :

After each BY group, you need to generate a flag to indicate whether the last record in this BY group has MW>0 and include this flag in the IF condition setting COUNTER to missing. Below, _IORC_ is used as such flag because it is (a) auto-retained, (b) auto-set to 0 at compile time, and (c) auto-dropped. The rest of the logic remains intact, so the second step would look like: 

data want (drop = _:) ;                                                        
  do _q = 1 by 1 until (last.mw) ;                                             
    set havelimits ;                                                           
    by group mw notsorted ;                                                    
  end ;                                                                        
  do _n_ = 1 to _q ;                                                           
    set havelimits ;                                                           
    if _iorc_ = 0 or mw > 0 or (_q < limit) or (_n_ > limit) then counter = . ;
    else counter = _n_ ;                                                       
    output ;                                                                   
  end ;                                                                        
  _iorc_ = mw > 0 ;                                                            
run ;                                                                          

Kind regards

Paul D.

Chad_OSU
Obsidian | Level 7

Awesome. thank you!!

hashman
Ammonite | Level 13

@Chad_OSU:

I like the way @novinosrin thinks - and the DoW loop he uses to achieve the result. However, methinks that in this case, it can be achieved simpler because:

- by the nature of the task, it doesn't require BY

- there's no need to impose any artificial limit on the counter

 

data have ;                                 
  input mw @@ ;                             
  cards ;                                   
5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 
;                                           
run ;                                       
                                            
data want ;                                 
  set have ;                                
  counter + mw = 0 ;                          
  if mw then counter = . ;                  
run ;                                       

Kind regards

 

Paul D.

Chad_OSU
Obsidian | Level 7

Thanks for the reply. In a later response I added that I do actually need to apply a limit to the counter.

novinosrin
Tourmaline | Level 20
data limits;
input group$ limit;
datalines;
x 3
y 5
;
run;

data have;
input group$ MW;
datalines;
x 0
x 0
x 5
x 5
x 0
x 0
x 0
x 0
x 0
x 0
x 0
y 0
y 0
y 5
y 0
y 0
y 0
y 0
y 0
y 0
y 0
run;

data want;
do until(last.group);
 merge have limits;
 by group;
 if mw then _f=1;
 if _f then _c=sum(_c,mw=0);
 counter=_c;
 if mw or  _c>limit then counter=.;
 output;
end;
drop _:;
run;
hashman
Ammonite | Level 13

@novinosrin:

Kudos! just what the doctor ordered.

So, in fact it can be done in a purely single pass.

I retract my wrongheaded statement to the contrary.

 

Kind regards

Paul D. 

novinosrin
Tourmaline | Level 20

Guru @hashman   Just to let you know. You are "original" and I am "copy". I have been following you since 2013. Well back then I didn't know anything about advance techniques, let alone hash. Of course it's only the last couple of years, I grasped the concepts.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 23 replies
  • 2739 views
  • 11 likes
  • 5 in conversation