BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ErinKSimmons
Obsidian | Level 7

Between a "Start_Date" and "End_Date", which are user input prompt values, I would like to find the maximum two week total demand value by inv_id.

 

I have attache an Excel file that has a daily_dmd tab which contains the table that has the summary table of total demand by day by inv_id. I can provide the code as necessary that generated this data set. 

 

Also in the Excel file is the list of all the date values and the date 14 days prior for which I would want find the total two week demand by INV_ID and then find the max of all those listed date ranges.

 

I do not have access to the PROC EXPAND method which is the rub. 

 

End goal is to have a table with two columns; INV_ID and MAX_TWOWEEKDMD.

Thanks in advance for all help.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below code first creates a rolling sum per inv_id over a given date range (here 14 days) and then selects the row with the max sum per inv_id.

 

%let start_dt='01jan2018'd;
%let stop_dt='31dec2020'd;
%let day_range=14;

libname source xlsx '~/test/DataSets.xlsx';
data have;
  set source.daily_dmd;
  INV_TXN_DT2=datepart(INV_TXN_DT);
  format INV_TXN_DT2 date9.;
run;

/* create rolling sum */
proc sql;
  create table rolling_sum as
    select 
      inv_id,
      INV_TXN_DT2,
      INV_TXN_DT2-&day_range+1 as prior_dt format=date9.,
      daily_dmd,
      (
        select sum(daily_dmd)
        from have i
        where 
          i.inv_id=o.inv_id
          and i.INV_TXN_DT2 between INV_TXN_DT2-&day_range+1 and o.INV_TXN_DT2
        ) as
        TWOWEEKDMD
    from have o
    where INV_TXN_DT2 between &start_dt and &stop_dt
    order by inv_id, INV_TXN_DT2
  ;
quit;

And now the rolling sum code combined with the selection of the max sum

/* select max value from rolling sum 
   DISTINCT result set because there can be multiple dates with a max sum
*/
proc sql;
  create table want as
    select distinct
      inv_id,
      max(TWOWEEKDMD) as MAX_TWOWEEKDMD
    from
      (
        select 
          inv_id,
          (
            select sum(daily_dmd)
            from have i
            where 
              i.inv_id=o.inv_id
              and i.INV_TXN_DT2 between INV_TXN_DT2-&day_range+1 and o.INV_TXN_DT2
            ) as
            TWOWEEKDMD
        from have o
        where INV_TXN_DT2 between &start_dt and &stop_dt
      )
    group by inv_id
    having max(TWOWEEKDMD) = TWOWEEKDMD
  ;
quit;

 

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

Below code first creates a rolling sum per inv_id over a given date range (here 14 days) and then selects the row with the max sum per inv_id.

 

%let start_dt='01jan2018'd;
%let stop_dt='31dec2020'd;
%let day_range=14;

libname source xlsx '~/test/DataSets.xlsx';
data have;
  set source.daily_dmd;
  INV_TXN_DT2=datepart(INV_TXN_DT);
  format INV_TXN_DT2 date9.;
run;

/* create rolling sum */
proc sql;
  create table rolling_sum as
    select 
      inv_id,
      INV_TXN_DT2,
      INV_TXN_DT2-&day_range+1 as prior_dt format=date9.,
      daily_dmd,
      (
        select sum(daily_dmd)
        from have i
        where 
          i.inv_id=o.inv_id
          and i.INV_TXN_DT2 between INV_TXN_DT2-&day_range+1 and o.INV_TXN_DT2
        ) as
        TWOWEEKDMD
    from have o
    where INV_TXN_DT2 between &start_dt and &stop_dt
    order by inv_id, INV_TXN_DT2
  ;
quit;

And now the rolling sum code combined with the selection of the max sum

/* select max value from rolling sum 
   DISTINCT result set because there can be multiple dates with a max sum
*/
proc sql;
  create table want as
    select distinct
      inv_id,
      max(TWOWEEKDMD) as MAX_TWOWEEKDMD
    from
      (
        select 
          inv_id,
          (
            select sum(daily_dmd)
            from have i
            where 
              i.inv_id=o.inv_id
              and i.INV_TXN_DT2 between INV_TXN_DT2-&day_range+1 and o.INV_TXN_DT2
            ) as
            TWOWEEKDMD
        from have o
        where INV_TXN_DT2 between &start_dt and &stop_dt
      )
    group by inv_id
    having max(TWOWEEKDMD) = TWOWEEKDMD
  ;
quit;

 

ErinKSimmons
Obsidian | Level 7
Thank you Patrick for your quick solution
mkeintz
PROC Star

 

The sql approach shows that SQL can be forced to do something it is not well-suited for - namely rolling sums across rows.  But if the data are already  sorted by INV_ID   INC_TXN_DT, then there is a much, much simpler and more efficient approach.

 

data want (drop=_:);

  array two_weeks {0:13} _temporary_;
  call missing(of two_weeks{*});

  do until (last.inv_id);
    set have;
    by inv_id;
    _dmd_date=datepart(inc_txn_dt);
    _lag_dmd_date=lag(dmd_date);

    if first.id=0 and (_dmd_date-1>_lag_dmd_date) then do _d=_lag_dmd_date+1 to _dmd_date-1;
      two_weeks{mod(_d,14)}=0;
    end;

    two_weeks{mod(_dmd_date,14)}+daily_dmd;
    max_two_weeks=max(max_two_weeks,sum(of two_weeks{*}));
  end;
 
run;
  1. The DO  UNTIL ... do-group has an embedded SET statement, which means  SAS will process all obs within one INV_ID group at a time.

  2. The  TWO_WEEKS array holds the most recent two weeks of data - i.e. it is a rolling trailing 14 days of DMD values. It's indexed from 0 to 13, corresponding with the remainder produced when dividing a date value by 14.
  3. the "if first.id=0  and (dmd_date-1>lag_dmd_date) ..." do loop makes sure that when there is a gap in dates, that you zero out data that would be wrongly inherited from older dates.

 

I suspect this approach is much closer to how PROC EXPAND works underneath the hood.

 

Code corrected per @ErinKSimmons 's note.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ErinKSimmons
Obsidian | Level 7
Thank you! This was also a great solution.

Small note, a ) is missing in this line:
max_two_weeks=max(max_two_weeks,sum(of two_weeks{*});

should be:
max_two_weeks=max(max_two_weeks,sum(of two_weeks{*}));

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 864 views
  • 2 likes
  • 3 in conversation