Solved: Rooling sum for one column between first datetime and datetime of that...

jovic92 · Posted 11-29-2017 09:32 AM

I have one problem and I think there is not much to correct to work right. I have table:

data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  bonus  usage ;
format  time   datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:15:07:33   100     0  
1    a1  10NOV2017:12:02:00  10NOV2017:13:24:50   100     6 
2    a1  10NOV2017:13:54:00  10NOV2017:14:02:58   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   100    12    
2    a1  10NOV2017:13:54:00  10NOV2017:13:57:12   100     7

So, I need to sum all usage values from time_purchase (for one id, opt combination (group by id, opt) there is just one unique t_purchase) until t_spent. Also, I have about million of rows, so hash table would be the best solution. I've tried with:

data want;
 if _n_=1 then do;
  if 0 then set have(rename=(usage=_usage));
  declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
  h.definekey('id','opt', 't_purchase', 't_spent');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
sum_usage=0;
do i=intck('second', t_purchase, t_spent) to t_spent ;
 if h.find(key:user,key:id_option,key:i)=0 then sum_usage-_usage;
end;
drop _usage i;
run;

The fifth line from the bottom is not correct for sure,(do i=intck('second', t_purchase, t_spent) but have no idea how to approach this, how to select all records between two datetime values. So, the main problem is how to set up time interval to calculate this. I have already one function in this hash table but with the slightly different time interval, so it would be pretty good to write this one too, but it's not necessary.

Desired output:

id  opt    t_purchase            t_spent       bonus   usage sum_usage
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00   100     3        9
1    a1  10NOV2017:12:02:00  10NOV2017:15:07:33   100     0        9
1    a1  10NOV2017:12:02:00  10NOV2017:13:24:50   100     6        6
2    a1  10NOV2017:13:54:00  10NOV2017:14:02:58   100     3        10
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   100    12        21 
2    a1  10NOV2017:13:54:00  10NOV2017:13:57:12   100     7 .      7

Ksharp · Posted 11-30-2017 09:03 AM

OK. If I understand your question.

data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  bonus  usage ;
format  t_purchase  t_spent datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:15:07:33   100     0  
1    a1  10NOV2017:12:02:00  10NOV2017:13:24:50   100     6 
2    a1  10NOV2017:13:54:00  10NOV2017:14:02:58   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   100    12    
2    a1  10NOV2017:13:54:00  10NOV2017:13:57:12   100     7 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set have(rename=(usage=_usage));
  declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
  h.definekey('id','opt', 't_purchase', 't_spent');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
sum_usage=0;
do i=t_purchase to t_spent ;
 if h.find(key:id,key:opt,key:t_purchase,key:i)=0 then sum_usage+_usage;
end;
drop _usage i;
run;

View solution in original post

mkeintz · Posted 11-29-2017 10:30 AM

What do you expect the "fifth line from the bottom" to be? You do mean the second record, with t_spent=10NOV2017:15:07:33, right?

Both record 1 (usage=3), and record 3 (usage=6) fall within the time-spanned in record 2. That's a total of sum_usage=9 (since record 2 has usage=0). Records 4 and 6 have a different id/opt combination, and record 5 is outside the record 2 time span, so their usage is not part of record 2 sum_usage.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jovic92 · Posted 11-29-2017 10:33 AM

I thought fifth line from bottom in code, not table. Intck function can't be used, i guess this way in this solution, but I have no idea how to solve this.

AhmedAl_Attar · Posted 11-29-2017 10:35 AM

Does this do what you are looking for?

proc sort data=have;
    BY id opt t_purchase t_spent;
run;

DATA want;
    
    DO UNTIL(LAST.id);
        SET work.HAVE;
        BY id opt t_purchase t_spent;
        IF (FIRST.id) THEN sum_usage=0;
        sum_usage + usage;
        OUTPUT;
    END;
RUN;

Ahmed

mkeintz · Posted 11-29-2017 03:44 PM

This can be a bit more compact. In particular you don't need the "do until ..." group, and you don't need the explicit OUTPUT statement:

proc sort data=have;
    BY id opt t_purchase t_spent;
run;

DATA want;
  SET work.HAVE;
  BY id opt t_purchase;
  IF (FIRST.purchase) THEN sum_usage=0;
  sum_usage + usage;
RUN;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

s_lassen · Posted 11-30-2017 04:24 AM

If you can sort the input data, this should do it:

proc sort data=have;
  by id opt t_purchase t_spent;
run;
data want;
  set have;
  by id opt t_purchase t_spent;
  if first.t_purchase then
    sum_usage=usage;
  else
    sum_usage+usage;
run;

Ksharp · Posted 11-30-2017 09:03 AM

OK. If I understand your question.

data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  bonus  usage ;
format  t_purchase  t_spent datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:15:07:33   100     0  
1    a1  10NOV2017:12:02:00  10NOV2017:13:24:50   100     6 
2    a1  10NOV2017:13:54:00  10NOV2017:14:02:58   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   100    12    
2    a1  10NOV2017:13:54:00  10NOV2017:13:57:12   100     7 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set have(rename=(usage=_usage));
  declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
  h.definekey('id','opt', 't_purchase', 't_spent');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
sum_usage=0;
do i=t_purchase to t_spent ;
 if h.find(key:id,key:opt,key:t_purchase,key:i)=0 then sum_usage+_usage;
end;
drop _usage i;
run;

Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

SAS Innovate 2025: Call for Content

Classroom Training Available!