BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rich93johnson
Fluorite | Level 6
Hello, I am relatively new to SAS and have viewed the various posts on the lag subject by group processing (using arrays, proc expand (don't have), etc.). I have below my code that derives a rolling sum which moves forward by day with a fixed window (which could be anything, but in this case is 20,30,40,...100). I am looking to build the code that will enable each separate group to be lagged by the corresponding value from the LOC table (which contains the Loc and the corresponding lag value). However, this is test code for a single item and eventually we will be incorporating thousands of items to view the max Qty over a rolling period. I am not a fan of arrays since it requires breaking out each separate group into a column, which if 1000 Items were needed you would have 1000*N different Loc's all in need of separate lag values so you would have an insane amount of columns. I would prefer to keep it all aligned moving down ONE column with each separate lag by group if that makes sense? 

I am looking for a forward rolling sum which subtracts out the last cumulative sum value using a fixed window that varies across groups. Below gets the job done, but once again it will not be able to adjust to more observations. If you could help that would be great!
 
PROC IMPORT DATAFILE = "/folders/myfolders/SAS Practice/CPS_MODEL_PRACT/sample_loop_CPS.xlsx"
DBMS = XLSX
OUT = LOOP_SAMPLE REPLACE;
RUN;
 
PROC SQL;
CREATE TABLE LOC AS
SELECT DISTINCT Loc, RLT AS RLT
FROM LOOP_SAMPLE;
QUIT;
 
PROC SORT DATA = LOOP_SAMPLE;
BY Loc;
RUN;
 
DATA HAVE;
SET LOOP_SAMPLE;
COUNT + Qty;
IF Loc ="DDSP" THEN A = LAG20(COUNT);
IF Loc ="DDCN" THEN A = LAG30(COUNT);
IF Loc ="DDCT" THEN A = LAG40(COUNT);
IF Loc ="DDDC" THEN A = LAG50(COUNT);
IF Loc ="DDDE" THEN A = LAG60(COUNT);
IF Loc ="DDDK" THEN A = LAG70(COUNT);
IF Loc ="DDJC" THEN A = LAG80(COUNT);
IF Loc ="DDJF" THEN A = LAG90(COUNT);
IF Loc ="DDSI" THEN A = LAG100(COUNT);
ROLLING_TOTAL = COUNT - A;
RUN;
 
PROC MEANS DATA = HAVE MAX;
BY Loc;
VAR ROLLING_TOTAL;
RUN;
 
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

That doesn't look like it needs dynamic programming.  You can use a temporary array to build your own stack of previous value. Use the MOD() function to index into it in a wrap around fashion. The only thing that might need to be dynamic is finding the maximum or B (or LAG) to define how large a stack of lagged values to make.  But why not just make it larger then you ever expect to need?

 

First let's make your posted printout into an actual dataset.  I will change the column headers into valid variable names.

Also let's calculate the relative row number (if you are missing rows then fix that first or else the logic will need to be more complex). (Note you could do that in the next step if you wanted instead).

data have ;
 length name $32 lag row qty want_lagN want_sum want_rolling 8;
 input name $ lag qty want_lagN want_sum want_rolling ;
 if name ne lag(name) then row=0;
 row+1;
cards;
Dog 2 1  .  1 . 
Dog 2 2  .  3 . 
Dog 2 3  1  6 5
Dog 2 4  3 10 7
Dog 2 5  6 15 9
Dog 2 6 10 21 11
Cat 3 1  .  1 . 
Cat 3 2  .  3 .
Cat 3 3  .  6 . 
Cat 3 4  1 10 9
Cat 3 5  3 15 12
;

Now just read in the data. Reset the "stack" when changing to a new "NAME".

data want ;
  array lags [0:100] _temporary_;
  set have;
  by name notsorted;
  if first.name then call missing(of lags[*] sum rolling_sum );
  sum+qty;
  lagn=lags[mod(row,lag)];
  if not missing(lagN) then rolling_sum=sum-lagN;
  lags[mod(row,lag)]=sum;
run;

Results:

                          want_             want_        rolling_
Obs  name  lag  row  qty   lagN  want_sum  rolling  sum     sum    lagn

  1  Dog    2    1    1      .       1         .      1      .       .
  2  Dog    2    2    2      .       3         .      3      .       .
  3  Dog    2    3    3      1       6         5      6      5       1
  4  Dog    2    4    4      3      10         7     10      7       3
  5  Dog    2    5    5      6      15         9     15      9       6
  6  Dog    2    6    6     10      21        11     21     11      10
  7  Cat    3    1    1      .       1         .      1      .       .
  8  Cat    3    2    2      .       3         .      3      .       .
  9  Cat    3    3    3      .       6         .      6      .       .
 10  Cat    3    4    4      1      10         9     10      9       1
 11  Cat    3    5    5      3      15        12     15     12       3

 

View solution in original post

17 REPLIES 17
Patrick
Opal | Level 21

@rich93johnson 

I can't really relate your narrative to the code you've posted. Because you write that the code returns the correct result, I've kept it as original as possible and only added some logic to dynamically generate your IF statements based on what's in the Loc table.

PROC IMPORT DATAFILE = "/folders/myfolders/SAS Practice/CPS_MODEL_PRACT/sample_loop_CPS.xlsx"
  DBMS = XLSX
  OUT = LOOP_SAMPLE REPLACE;
RUN;
PROC SQL;
  CREATE TABLE LOC AS
    SELECT DISTINCT Loc, RLT AS RLT
      FROM LOOP_SAMPLE;
QUIT;

filename codegen temp;
data _null_;
  file codegen;
/*  file print;*/
  set loc;
  length code $80;
  code=cats('IF Loc ="',loc,'" THEN A = LAG',rlt,'(COUNT);');
  if _n_>1 then code=catx(' ','else',code);
  put code;
run;

PROC SORT DATA = LOOP_SAMPLE;
  BY Loc;
RUN;

DATA HAVE;
  SET LOOP_SAMPLE;
  COUNT + Qty;
  %include codegen / source2;
  ROLLING_TOTAL = COUNT - A;
RUN;

PROC MEANS DATA = HAVE MAX;
  BY Loc;
  VAR ROLLING_TOTAL;
RUN;
filename codegen clear;

 

rich93johnson
Fluorite | Level 6
This worked perfectly.

I guess the question was how am I to reference the Loc table and use a Macro variable to run the "if then" statements dynamically, so that as you go down the main dataset, and the Loc changes from one to the next, the macro variable for lag&XXX(COUNT) resolves to the corresponding value to the same Loc from the Loc table. Is it possible to do it this way? While still keeping the same resulting structure?
rich93johnson
Fluorite | Level 6
PROC SQL;
SELECT DISTINCT QUOTE(Loc)
INTO :SKUS SEPARATED BY ', '
FROM Loc;

PROC SORT DATA = LOOP_SAMPLE;
BY Loc;
RUN;

DATA KINDA;
SET LOOP_SAMPLE;
IF FIRST.Loc = 1 THEN COUNT = 0;
COUNT + Qty;
IF Loc IN(&SKUS) THEN A = LAG&RLT(COUNT);
RUN;

Something along the lines of this, however, what I am finding is that the values are not changing dynamically. I think I am close, just missing something.
Patrick
Opal | Level 21

Macro variables &skus and &rlt will have a single value during compilation time of the SAS data step and for this reason the SAS code is static.

IF Loc IN(&SKUS) THEN A = LAG&RLT(COUNT);

You can't have something dynamic the way you have it in mind because during compilation time of the SAS data step all variables and functions must be "spelled out". You can't create additional variables or functions during data step execution time.

 

Good that you posted additional code. Below answered one of my questions. You also would need a By Loc; for this to work.

by loc;
IF FIRST.Loc = 1 THEN COUNT = 0;

What I still don't understand and also can't see anywhere in your code: You're talking about a rolling window and quarterly. I can't see any attempt in your code to have such a rolling window. It's just a "lagged" summing up per Loc (which could easily get implemented via a Proc SQL). Any sort of rolling window would only sum entries within a specified date range based on the date of the current row.

Can you please further explain this rolling window thing?  

 

 

rich93johnson
Fluorite | Level 6
AB
Dog2
Cat3

 

ALagQtylag(Cum_Qty)Cum_QtyRolling Total
Dog21 1 
Dog22 3 
Dog23165
Dog243107
Dog256159
Dog26102111
Cat31 1 
Cat32 3 
Cat33 6 
Cat341109
Cat3531512

 

The example above is what I am looking for essentially, which you have provided in terms of getting across what I am doing just for that specific dataset, but not if I introduce more variables in column A with different values for lag. How the rolling window works is the Rolling Total of 5 for instance is 2+3 in Qty (2 values or a "window" of 2) and 7 is 4+3 (continuing the window of 2). The fixed window moves forward each row (or day if there was a date column). For Cat the Rolling Total of 9 is 4+3+2 (3 values for a window of 3) and 12 is 5+4+3 (fixed 3 value window). I want to use the value Lag equaling 2 or 3 as an input for the lag function but lagN(Cum_Qty) won't allow you to directly insert the value corresponding to the Lag column, which is why I developed a separate table using Proc Sql to then get a macro variable inserted into the lag function so I could end up with lag&lag(Cum_Qty) which would resolve to 2 for Dog and 3 for Cat (hence the dynamic aspect). Below I have posted the progress I have made.

 

PROC SQL;
CREATE TABLE LOC AS
SELECT DISTINCT Loc, RLT AS RLT
FROM LOOP_SAMPLE;
QUIT;

 

%LET NumUni=100;
proc sql;
select distinct Loc,RLT
into :LOC1-:LOC&NumUni , :VAR1-:VAR&NumUni
from LOOP_SAMPLE;
quit;
 
proc sql;
select distinct QUOTE(Loc),RLT
into :LOC1-:LOC9 , :VAR1-:VAR9
from LOOP_SAMPLE;
quit;
 
PROC SORT DATA = LOOP_SAMPLE;
BY Loc;
RUN;
 
DATA HERE_GOES;
SET LOOP_SAMPLE;
WHERE Date > &CURR & Item = &NIIN;
BY Loc;
IF FIRST.Loc = 1 THEN COUNT = 0;
COUNT + Qty;
IF Loc = "&LOC1" THEN A = LAG&VAR1(COUNT);
/* IF Loc = "&LOC2" THEN A = LAG&VAR2(COUNT); */
/* IF Loc = "&LOC3" THEN A = LAG&VAR3(COUNT); */
/* IF Loc = "&LOC4" THEN A = LAG&VAR4(COUNT); */
/* IF Loc = "&LOC5" THEN A = LAG&VAR5(COUNT); */
/* IF Loc = "&LOC6" THEN A = LAG&VAR6(COUNT); */
/* IF Loc = "&LOC7" THEN A = LAG&VAR7(COUNT); */
/* IF Loc = "&LOC8" THEN A = LAG&VAR8(COUNT); */
ELSE A = LAG&VAR9(Qty);
RUN;
 
This works per se but if I could turn the IF THEN statements into a dynamic code (using a DO LOOP possbily) I would be golden!
Not sure if I said this yet, but I am absolutely honored to receive help on this. 
Patrick
Opal | Level 21

@rich93johnson 

So Cum_Qty is just summing up all rows (lagged, per loc) meaning if you've got dates from 2016 to 2020 it contains in the end the sum of all these years?

rich93johnson
Fluorite | Level 6
At the end of 4 years you would have the Rolling Total being equivalent to the total of Qty of December 31st, 2020 and December 30th, because the window of 2 moves with, subtracting out all the other days.
rich93johnson
Fluorite | Level 6
But yes, the Cum_Qty would be reiterated by Loc and the Lag(Cum_Qty) would be lagged by the amount specified in the Lag column.
Patrick
Opal | Level 21

ok. I'll give it a go. The solution will likely be something along the line as solved here.

Patrick
Opal | Level 21

@rich93johnson wrote:
At the end of 4 years you would have the Rolling Total being equivalent to the total of Qty of December 31st, 2020 and December 30th, because the window of 2 moves with, subtracting out all the other days.

So do you want a rolling 3 month look-back period or do you want to start afresh every quarter? If starting afresh each quarter: Would you include lagged values from the previous quarter or only include them if they are from the same quarter? 

rich93johnson
Fluorite | Level 6
I sort of understand your question. I would want a rolling 3 month look back period. So today minus 90 days and then tomorrow minus 90 days, which would subtract out the -91st day (because "today" would be tomorrow) and then the following day minus 90 days which would subtract out the 91st and 92nd day looking backwards and so forth. The 3 month period is fixed and moves forward by 1 day.
Patrick
Opal | Level 21

@Tom s code nicely demonstrates how much more efficient a SAS data step can process sequentially as compared to traditional SQL.

I've applied @Tom s code to more realistic sample data - and I've also posted the SQL I came up with.

Interestingly the results start to differ starting with day 93.

I believe what's missing in @Tom 's code is logic for a time based windows for column sum. I leave it to you/Tom to investigate and eventually fix.

...actually: I believe something in my SQL is not working as it should. The data step approach is so much better that I'm now too lazy to fix this. I wish I could give Tom 10 likes for what he's posted.

proc datasets lib=work kill nolist;
run;quit;

data LOOP_SAMPLE;
  FORMAT
    Item BEST12.
    Date DATE9.
    Loc  $CHAR4.
    Qty  BEST12.
    RLT  BEST12.;
  call streaminit(12345);
/*  do date='03oct2016'd to '12jan2017'd;*/
  do date='03oct2016'd to '12jan2020'd;
    do item=1 to 3;
      do loc='DDSP','DDCN','DDCT';
        if loc='DDSP' then rlt=20;
        else if loc='DDCN' then rlt=10;
        else rlt=20;
        qty=ceil(rand('uniform',0,999));
        output;
      end;
    end;
  end;
  stop;
run;

proc datasets lib=work nolist nowarn mt=(data view);
  delete v_lag_cum_qty v_cum_qty want1;
  run;
quit;

proc sql;
  create view v_lag_cum_qty as
/*  create table v_lag_cum_qty as*/
  select 
    l.loc, l.item, l.date,
    sum(r.qty) as lag_cum_qty
  from 
    loop_sample l
    left join
    loop_sample r
    on 
      l.loc=r.loc and l.item=r.item 
      and r.date between intnx('month',l.date-l.rlt,-3,'s')+1 and l.date-l.rlt
  group by l.loc, l.item, l.date
  ;

  create view v_cum_qty as
/*  create table v_cum_qty as*/
  select 
    l.loc, l.item, l.date,
    sum(r.qty) as cum_qty
  from 
    loop_sample l
    left join
    loop_sample r
    on 
      l.loc=r.loc and l.item=r.item 
      and r.date between intnx('month',l.date,-3,'s')+1 and l.date
  group by l.loc, l.item, l.date
  ;

  create table want1 as
    select 
      l.*,
      r1.lag_cum_qty,
      r2.cum_qty,
      r2.cum_qty - r1.lag_cum_qty as Rolling_Total
    from 
      loop_sample l
      inner join
      v_lag_cum_qty r1
        on l.loc=r1.loc and l.item =r1.item and l.date=r1.date
      inner join
      v_cum_qty r2
        on l.loc=r2.loc and l.item =r2.item and l.date=r2.date
    order by l.loc, l.item, l.date
  ;
quit;

proc print data=want1(obs=95  firstobs=88);
run;


proc sort data=LOOP_SAMPLE;
  by loc item date;
run;

data want2;
  set LOOP_SAMPLE;
  by loc item date;
  array lags [0:100] _temporary_;
  if first.item then 
    call missing(of lags[*] row sum rolling_sum );
  row+1;
  sum+qty;
  lagn=lags[mod(row,rlt)];
  if not missing(lagN) then rolling_sum=sum-lagN;
  lags[mod(row,rlt)]=sum;
run;
proc print data=want2(obs=95 firstobs=88);
run;

 Capture.JPG

Tom
Super User Tom
Super User

That doesn't look like it needs dynamic programming.  You can use a temporary array to build your own stack of previous value. Use the MOD() function to index into it in a wrap around fashion. The only thing that might need to be dynamic is finding the maximum or B (or LAG) to define how large a stack of lagged values to make.  But why not just make it larger then you ever expect to need?

 

First let's make your posted printout into an actual dataset.  I will change the column headers into valid variable names.

Also let's calculate the relative row number (if you are missing rows then fix that first or else the logic will need to be more complex). (Note you could do that in the next step if you wanted instead).

data have ;
 length name $32 lag row qty want_lagN want_sum want_rolling 8;
 input name $ lag qty want_lagN want_sum want_rolling ;
 if name ne lag(name) then row=0;
 row+1;
cards;
Dog 2 1  .  1 . 
Dog 2 2  .  3 . 
Dog 2 3  1  6 5
Dog 2 4  3 10 7
Dog 2 5  6 15 9
Dog 2 6 10 21 11
Cat 3 1  .  1 . 
Cat 3 2  .  3 .
Cat 3 3  .  6 . 
Cat 3 4  1 10 9
Cat 3 5  3 15 12
;

Now just read in the data. Reset the "stack" when changing to a new "NAME".

data want ;
  array lags [0:100] _temporary_;
  set have;
  by name notsorted;
  if first.name then call missing(of lags[*] sum rolling_sum );
  sum+qty;
  lagn=lags[mod(row,lag)];
  if not missing(lagN) then rolling_sum=sum-lagN;
  lags[mod(row,lag)]=sum;
run;

Results:

                          want_             want_        rolling_
Obs  name  lag  row  qty   lagN  want_sum  rolling  sum     sum    lagn

  1  Dog    2    1    1      .       1         .      1      .       .
  2  Dog    2    2    2      .       3         .      3      .       .
  3  Dog    2    3    3      1       6         5      6      5       1
  4  Dog    2    4    4      3      10         7     10      7       3
  5  Dog    2    5    5      6      15         9     15      9       6
  6  Dog    2    6    6     10      21        11     21     11      10
  7  Cat    3    1    1      .       1         .      1      .       .
  8  Cat    3    2    2      .       3         .      3      .       .
  9  Cat    3    3    3      .       6         .      6      .       .
 10  Cat    3    4    4      1      10         9     10      9       1
 11  Cat    3    5    5      3      15        12     15     12       3

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 4968 views
  • 3 likes
  • 3 in conversation