BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lmg
Calcite | Level 5 lmg
Calcite | Level 5

Hello,

 

I need have a cumulative sum of the next couple of weeks (dependent on a factor variable). I can do this with SQL, but is bad for performance and does not run on large data sets.

 

Basically I want the sum of the next X amount of weeks (based on the column WKfactor) for each unique product and location. Original data does not have the uniquekey, but created it for sql code.

 

UniqueKeyProductLocationWeekWKFactorValuedesired sum 
A1A17-Jan-1944394<-sum of 'value' for the next for the next 4 weeks (comes from WK factor) of product A, Location 1
A1A114-Jan-195276<-sum of 'value' for the next for the next 45weeks (comes from WK factor) of product A, Location 1
A1A121-Jan-1942774<-sum of 'value' for the next for the next 4 weeks (comes from WK factor) of product A, Location 1
A1A128-Jan-1942247<-sum of 'value' for the next for the next 3 weeks (because week factor is 4, but there are only 3 weeks in data set remaining) of product A, Location 1
A1A14-Feb-1942225 
A1A111-Feb-19433 
A2A27-Jan-1923166 
A2A214-Jan-1923576 
A2A221-Jan-1924147 
A2A228-Jan-19266 
B1B17-Jan-19322101 
B1B114-Jan-1923479 
B1B121-Jan-1934597 
B1B128-Jan-1952252 
B1B14-Feb-1923030 

 

This is the SQL code that gives me the desired output, but takes too long doing the inner joins on such a large data set. any better ways?

 

proc sql;
create table work.desire as
select uniquekey, product, location, week, wkfactor, value,
(select sum(value), from work.have where uniquekey=a.uniquekey and week >= a.week and week <= intnx('week', a.week, a.wkfactor)) as desiredsum
from work.have as a;
1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@lmg:

Since you have the (PRODUCT,LOCATION) composite key properly sorted, there's no need for UNIQUEKEY to be created: The processing can rely on the composite key alone; and it looks like a straightforward DoW-loop/array job:

data have ;                                                                        
  input (UniqueKey Product) (:$2.) Location Week :date. WKFactor Value ;           
  format week yymmdd10. ;                                                          
  cards ;                                                                          
A1      A      1      07-Jan-19      4      43      94                             
A1      A      1      14-Jan-19      5       2      76                             
A1      A      1      21-Jan-19      4      27      74                             
A1      A      1      28-Jan-19      4      22      47                             
A1      A      1      04-Feb-19      4      22      25                             
A1      A      1      11-Feb-19      4       3       3                             
A2      A      2      07-Jan-19      2      31      66                             
A2      A      2      14-Jan-19      2      35      76                             
A2      A      2      21-Jan-19      2      41      47                             
A2      A      2      28-Jan-19      2       6       6                             
B1      B      1      07-Jan-19      3      22     101                             
B1      B      1      14-Jan-19      2      34      79                             
B1      B      1      21-Jan-19      3      45      97                             
B1      B      1      28-Jan-19      5      22      52                             
B1      B      1      04-Feb-19      2      30      30                             
;                                                                                  
run ;                                                                              
                                                                                   
data want (drop = _:) ;                                                            
  do _i = 1 by 1 until (last.location) ;                                           
    set have ;                                                                     
    by product location ;                                                          
    array wfs [3, 1000000] _temporary_ ;                                           
    wfs [1, _i] = week ;                                                           
    wfs [2, _i] = wkfactor ;                                                       
    do _j = 1 to _i ;                                                              
      if intck ("week", wfs [1, _j], week) < wfs [2, _j] then wfs [3, _j] + value ;
    end ;                                                                          
  end ;                                                                            
  do _i = 1 to _i ;                                                                
    set have ;                                                                     
    desired_value = wfs [3, _i] ;                                                  
    wfs [3, _i] = . ;                                                              
    output ;                                                                       
  end ;                                                                            
run ;                                                                              

This code relies on the assumption that no (PRODUCT,LOCATION) by-group is larger than 1 million rows. Methinks it's pretty reasonable, and all you sacrifice making it system-wise is about 20 MB of memory. However, if you don't like making assumptions about data whatsoever, be they reasonable or not, you can eschew it by making an extra preliminary pass through the data set to size the upper bounds of the array WFS:

proc sql noprint ;                                                                        
  select max (q) into :hb from (select count (*) as q from have group product, location) ;
quit ;                                                                                    
                                                                                          
data want (drop = _:) ;                                                                   
  do _i = 1 by 1 until (last.location) ;                                                  
    set have ;                                                                            
    by product location ;                                                                 
    array wfs [3, &hb] _temporary_ ;                                                      
    wfs [1, _i] = week ;                                                                  
    wfs [2, _i] = wkfactor ;                                                              
    do _j = 1 to _i ;                                                                     
      if intck ("week", wfs [1, _j], week) < wfs [2, _j] then wfs [3, _j] + value ;       
    end ;                                                                                 
  end ;                                                                                   
  do _i = 1 to _i ;                                                                       
    set have ;                                                                            
    desired_value = wfs [3, _i] ;                                                         
    wfs [3, _i] = . ;                                                                     
    output ;                                                                              
  end ;                                                                                   
run ;                                                                                     

Yet another, even more dynamic approach, is to replace the array with a hash table and thus avoid the extra pass through the data. If you would like to have a stab at this exercise, be my guest; and it you should find it a bit laborious, circle back and someone will sure help you figure it out. Personally, I think that you're perfectly safe with the "1000000" assumption, and the solution using it is most likely to be the fastest time-wise.

 

Kind regards

Paul D.     

View solution in original post

3 REPLIES 3
Reeza
Super User

Do you numbers always vary between 2 and 5 or can they be anything? Do you have a license for SAS ETS?

 

You can check what you have licensed in SAS using:

proc setinit; run;

And what's installed on your system using:

proc product_status; run;

The output will be in the log. 

 


@lmg wrote:

Hello,

 

I need have a cumulative sum of the next couple of weeks (dependent on a factor variable). I can do this with SQL, but is bad for performance and does not run on large data sets.

 

Basically I want the sum of the next X amount of weeks (based on the column WKfactor) for each unique product and location. Original data does not have the uniquekey, but created it for sql code.

 

UniqueKey Product Location Week WKFactor Value desired sum  
A1 A 1 7-Jan-19 4 43 94 <-sum of 'value' for the next for the next 4 weeks (comes from WK factor) of product A, Location 1
A1 A 1 14-Jan-19 5 2 76 <-sum of 'value' for the next for the next 45weeks (comes from WK factor) of product A, Location 1
A1 A 1 21-Jan-19 4 27 74 <-sum of 'value' for the next for the next 4 weeks (comes from WK factor) of product A, Location 1
A1 A 1 28-Jan-19 4 22 47 <-sum of 'value' for the next for the next 3 weeks (because week factor is 4, but there are only 3 weeks in data set remaining) of product A, Location 1
A1 A 1 4-Feb-19 4 22 25  
A1 A 1 11-Feb-19 4 3 3  
A2 A 2 7-Jan-19 2 31 66  
A2 A 2 14-Jan-19 2 35 76  
A2 A 2 21-Jan-19 2 41 47  
A2 A 2 28-Jan-19 2 6 6  
B1 B 1 7-Jan-19 3 22 101  
B1 B 1 14-Jan-19 2 34 79  
B1 B 1 21-Jan-19 3 45 97  
B1 B 1 28-Jan-19 5 22 52  
B1 B 1 4-Feb-19 2 30 30  

 

This is the SQL code that gives me the desired output, but takes too long doing the inner joins on such a large data set. any better ways?

 

proc sql;
create table work.desire as
select uniquekey, product, location, week, wkfactor, value,
(select sum(value), from work.have where uniquekey=a.uniquekey and week >= a.week and week <= intnx('week', a.week, a.wkfactor)) as desiredsum
from work.have as a;

 

hashman
Ammonite | Level 13

@lmg:

Since you have the (PRODUCT,LOCATION) composite key properly sorted, there's no need for UNIQUEKEY to be created: The processing can rely on the composite key alone; and it looks like a straightforward DoW-loop/array job:

data have ;                                                                        
  input (UniqueKey Product) (:$2.) Location Week :date. WKFactor Value ;           
  format week yymmdd10. ;                                                          
  cards ;                                                                          
A1      A      1      07-Jan-19      4      43      94                             
A1      A      1      14-Jan-19      5       2      76                             
A1      A      1      21-Jan-19      4      27      74                             
A1      A      1      28-Jan-19      4      22      47                             
A1      A      1      04-Feb-19      4      22      25                             
A1      A      1      11-Feb-19      4       3       3                             
A2      A      2      07-Jan-19      2      31      66                             
A2      A      2      14-Jan-19      2      35      76                             
A2      A      2      21-Jan-19      2      41      47                             
A2      A      2      28-Jan-19      2       6       6                             
B1      B      1      07-Jan-19      3      22     101                             
B1      B      1      14-Jan-19      2      34      79                             
B1      B      1      21-Jan-19      3      45      97                             
B1      B      1      28-Jan-19      5      22      52                             
B1      B      1      04-Feb-19      2      30      30                             
;                                                                                  
run ;                                                                              
                                                                                   
data want (drop = _:) ;                                                            
  do _i = 1 by 1 until (last.location) ;                                           
    set have ;                                                                     
    by product location ;                                                          
    array wfs [3, 1000000] _temporary_ ;                                           
    wfs [1, _i] = week ;                                                           
    wfs [2, _i] = wkfactor ;                                                       
    do _j = 1 to _i ;                                                              
      if intck ("week", wfs [1, _j], week) < wfs [2, _j] then wfs [3, _j] + value ;
    end ;                                                                          
  end ;                                                                            
  do _i = 1 to _i ;                                                                
    set have ;                                                                     
    desired_value = wfs [3, _i] ;                                                  
    wfs [3, _i] = . ;                                                              
    output ;                                                                       
  end ;                                                                            
run ;                                                                              

This code relies on the assumption that no (PRODUCT,LOCATION) by-group is larger than 1 million rows. Methinks it's pretty reasonable, and all you sacrifice making it system-wise is about 20 MB of memory. However, if you don't like making assumptions about data whatsoever, be they reasonable or not, you can eschew it by making an extra preliminary pass through the data set to size the upper bounds of the array WFS:

proc sql noprint ;                                                                        
  select max (q) into :hb from (select count (*) as q from have group product, location) ;
quit ;                                                                                    
                                                                                          
data want (drop = _:) ;                                                                   
  do _i = 1 by 1 until (last.location) ;                                                  
    set have ;                                                                            
    by product location ;                                                                 
    array wfs [3, &hb] _temporary_ ;                                                      
    wfs [1, _i] = week ;                                                                  
    wfs [2, _i] = wkfactor ;                                                              
    do _j = 1 to _i ;                                                                     
      if intck ("week", wfs [1, _j], week) < wfs [2, _j] then wfs [3, _j] + value ;       
    end ;                                                                                 
  end ;                                                                                   
  do _i = 1 to _i ;                                                                       
    set have ;                                                                            
    desired_value = wfs [3, _i] ;                                                         
    wfs [3, _i] = . ;                                                                     
    output ;                                                                              
  end ;                                                                                   
run ;                                                                                     

Yet another, even more dynamic approach, is to replace the array with a hash table and thus avoid the extra pass through the data. If you would like to have a stab at this exercise, be my guest; and it you should find it a bit laborious, circle back and someone will sure help you figure it out. Personally, I think that you're perfectly safe with the "1000000" assumption, and the solution using it is most likely to be the fastest time-wise.

 

Kind regards

Paul D.     

lmg
Calcite | Level 5 lmg
Calcite | Level 5
This worked! i realized i need the next 5 weeks and to not include the current week value, but i can adjust the code to accordingly. This is a great solution. Thanks again!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1780 views
  • 1 like
  • 3 in conversation