Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Rooling sum for one column between first datetime and datetime of that record (row) in SAS

I have one problem and I think there is not much to correct to work right. I have table:

 

data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  bonus  usage ;
format  time   datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:15:07:33   100     0  
1    a1  10NOV2017:12:02:00  10NOV2017:13:24:50   100     6 
2    a1  10NOV2017:13:54:00  10NOV2017:14:02:58   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   100    12    
2    a1  10NOV2017:13:54:00  10NOV2017:13:57:12   100     7 

So, I need to sum all usage values from time_purchase (for one id, opt combination (group by id, opt) there is just one unique t_purchase) until t_spent. Also, I have about million of rows, so hash table would be the best solution. I've tried with:

 

data want;
 if _n_=1 then do;
  if 0 then set have(rename=(usage=_usage));
  declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
  h.definekey('id','opt', 't_purchase', 't_spent');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
sum_usage=0;
do i=intck('second', t_purchase, t_spent) to t_spent ;
 if h.find(key:user,key:id_option,key:i)=0 then sum_usage-_usage;
end;
drop _usage i;
run;

 The fifth line from the bottom is not correct for sure,(do i=intck('second', t_purchase, t_spent) but have no idea how to approach this, how to select all records between two datetime values. So, the main problem is how to set up time interval to calculate this. I have already one function in this hash table but with the slightly different time interval, so it would be pretty good to write this one too, but it's not necessary.

 

Desired output:

id  opt    t_purchase            t_spent       bonus   usage sum_usage
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00   100     3        9
1    a1  10NOV2017:12:02:00  10NOV2017:15:07:33   100     0        9
1    a1  10NOV2017:12:02:00  10NOV2017:13:24:50   100     6        6
2    a1  10NOV2017:13:54:00  10NOV2017:14:02:58   100     3        10
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   100    12        21 
2    a1  10NOV2017:13:54:00  10NOV2017:13:57:12   100     7 .      7

Accepted Solutions
Solution
‎11-30-2017 11:03 AM
Super User
Posts: 10,313

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

OK. If I understand your question.

 

data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  bonus  usage ;
format  t_purchase  t_spent datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:15:07:33   100     0  
1    a1  10NOV2017:12:02:00  10NOV2017:13:24:50   100     6 
2    a1  10NOV2017:13:54:00  10NOV2017:14:02:58   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   100    12    
2    a1  10NOV2017:13:54:00  10NOV2017:13:57:12   100     7 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set have(rename=(usage=_usage));
  declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
  h.definekey('id','opt', 't_purchase', 't_spent');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
sum_usage=0;
do i=t_purchase to t_spent ;
 if h.find(key:id,key:opt,key:t_purchase,key:i)=0 then sum_usage+_usage;
end;
drop _usage i;
run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,145

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

What do you expect the "fifth line from the bottom" to be?  You do mean the second record, with t_spent=10NOV2017:15:07:33, right?

 

Both record 1 (usage=3), and record 3 (usage=6) fall within the time-spanned in record 2.  That's a total of sum_usage=9 (since record 2 has usage=0).  Records 4 and 6 have a different id/opt combination, and record 5 is outside the record 2 time span, so their usage is not part of record 2 sum_usage.

Contributor
Posts: 23

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

I thought fifth line from bottom in code, not table. Intck function can't be used, i guess this way in this solution, but I have no idea how to solve this.
Regular Contributor
Posts: 245

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Does this do what you are looking for?

proc sort data=have;
    BY id opt t_purchase t_spent;
run;

DATA want;
    
    DO UNTIL(LAST.id);
        SET work.HAVE;
        BY id opt t_purchase t_spent;
        IF (FIRST.id) THEN sum_usage=0;
        sum_usage + usage;
        OUTPUT;
    END;
RUN;

Ahmed

Trusted Advisor
Posts: 1,145

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

Posted in reply to AhmedAl_Attar

This can be a bit more compact.  In particular you don't need the "do until ..." group, and you don't need the explicit OUTPUT statement:

 

 

proc sort data=have;
    BY id opt t_purchase t_spent;
run;

DATA want;
  SET work.HAVE;
  BY id opt t_purchase;
  IF (FIRST.purchase) THEN sum_usage=0;
  sum_usage + usage;
RUN;
PROC Star
Posts: 165

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

If you can sort the input data, this should do it:

proc sort data=have;
  by id opt t_purchase t_spent;
run;
data want;
  set have;
  by id opt t_purchase t_spent;
  if first.t_purchase then
    sum_usage=usage;
  else
    sum_usage+usage;
run;

 

Solution
‎11-30-2017 11:03 AM
Super User
Posts: 10,313

Re: Rooling sum for one column between first datetime and datetime of that record (row) in SAS

OK. If I understand your question.

 

data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  bonus  usage ;
format  t_purchase  t_spent datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:15:07:33   100     0  
1    a1  10NOV2017:12:02:00  10NOV2017:13:24:50   100     6 
2    a1  10NOV2017:13:54:00  10NOV2017:14:02:58   100     3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   100    12    
2    a1  10NOV2017:13:54:00  10NOV2017:13:57:12   100     7 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set have(rename=(usage=_usage));
  declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
  h.definekey('id','opt', 't_purchase', 't_spent');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
sum_usage=0;
do i=t_purchase to t_spent ;
 if h.find(key:id,key:opt,key:t_purchase,key:i)=0 then sum_usage+_usage;
end;
drop _usage i;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 156 views
  • 0 likes
  • 5 in conversation