BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jovic92
Obsidian | Level 7

Hi, everyone,

What is needed is to calculate for every record (every row) the last 3hr sum of usage (Usage is one of the columns in dataset) grouped by User and ID_option. 

Every line(row) represent one record (within 3 min time interval). For example (including desired column sum_usage):

 

 

User  ID_option          time             usage        sum_usage_3hr
1         a1        12OCT2017:11:20:32       3             10
1         a1        12OCT2017:10:23:24       7             14
1         b1        12OCT2017:09:34:55       12            12
2         b1        12OCT2017:08:55:06       4              6
1         a1        12OCT2017:07:59:53       7              7
2         b1        12OCT2017:06:59:12       2              2

I have tried with something like this code below, and it return me sum of all time, not just the last 3HR. I'm not surprised, but have not so much idea how I'm going to do this in sas. 

proc sql:
CREATE table my_table
SELECT *, SUM(usage) AS sum_usage_3hr
FROM prev_table WHERE time BETWEEN TIME and intnx('second', time, -3*3600)
GROUP BY User, ID_option;
RUN;

 Any help is welcomed, thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. Here is . Hope you have big enough memory to run Hash Table.

 

data have;
input User  ID_option  $        time   : datetime32.          usage   ;
format  time   datetime32. ;
cards; 
1         a1        12OCT2017:11:20:32       3             
1         a1        12OCT2017:10:23:24       7             
1         b1        12OCT2017:09:34:55       12            
2         b1        12OCT2017:08:55:06       4             
1         a1        12OCT2017:07:59:53       7             
2         b1        12OCT2017:06:59:12       2             
;
run;
data want;
 if _n_=1 then do;
  if 0 then set have(rename=(usage=_usage));
  declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
  h.definekey('user','id_option','time');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
sum_usage_3hr=0;
do i=time-3*3600 to time ;
 if h.find(key:user,key:id_option,key:i)=0 then sum_usage_3hr+_usage;
end;
drop _usage i;
run;

View solution in original post

21 REPLIES 21
Ksharp
Super User
data have;
input User  ID_option  $        time   : datetime32.          usage   ;
format  time   datetime32. ;
cards; 
1         a1        12OCT2017:11:20:32       3             
1         a1        12OCT2017:10:23:24       7             
1         b1        12OCT2017:09:34:55       12            
2         b1        12OCT2017:08:55:06       4             
1         a1        12OCT2017:07:59:53       7             
2         b1        12OCT2017:06:59:12       2             
;
run;
proc sql;
select *,(select sum(usage) from have as b where b.user=a.user and b.ID_option=a.ID_option and
b.time between a.time-3*3600 and a.time ) as want
 from have as a;
quit;
jovic92
Obsidian | Level 7

I'm testing this code now, but it works about 1 hour, and still counting, I don't know why, it looks good. 

Maybe there is need for some sorting data before aggregation? 

Thank you for your reply. 

Ksharp
Super User

Do you have a big table ? If you have, then I suggest to split your big table into many small table, each contains only ONE id . and run my code . Or maybe you need other fast algorithm like Hash Table .

jovic92
Obsidian | Level 7

Yes, it is a big table, but I think it should be done till now. I will try that, thanks. 

jovic92
Obsidian | Level 7

Now I've made table a smaller, I've take just 5 of 24 loader files and still have the same problem. 

Ksharp
Super User

OK. Here is . Hope you have big enough memory to run Hash Table.

 

data have;
input User  ID_option  $        time   : datetime32.          usage   ;
format  time   datetime32. ;
cards; 
1         a1        12OCT2017:11:20:32       3             
1         a1        12OCT2017:10:23:24       7             
1         b1        12OCT2017:09:34:55       12            
2         b1        12OCT2017:08:55:06       4             
1         a1        12OCT2017:07:59:53       7             
2         b1        12OCT2017:06:59:12       2             
;
run;
data want;
 if _n_=1 then do;
  if 0 then set have(rename=(usage=_usage));
  declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
  h.definekey('user','id_option','time');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
sum_usage_3hr=0;
do i=time-3*3600 to time ;
 if h.find(key:user,key:id_option,key:i)=0 then sum_usage_3hr+_usage;
end;
drop _usage i;
run;
jovic92
Obsidian | Level 7

I have got an error: 

Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid. 

Ksharp
Super User

And don't forget round your TIME variable, if it has decimal number.

 

data have;
input User  ID_option  $        time   : datetime32.          usage   ;
time=int(time);
format  time   datetime32. ;
jovic92
Obsidian | Level 7

Time is in format I have posted in question. 

Thank you, I'm going to try that in a few minutes. 

Ksharp
Super User

If you don't have big memory to run Hash Table, Try this one .

NOTE: these code is assuming there are not duplicate TIME for the same user and id_option.

If there duplicated TIME(e.g.  two or more obs have the same TIME under the same user and same id_option),

use PROC SUMMARY to sum USAGE.

 

data have;
input User  ID_option  $        time   : datetime32.          usage   ;
format  time   datetime32. ;
cards; 
1         a1        12OCT2017:11:20:32       3             
1         a1        12OCT2017:10:23:24       7             
1         b1        12OCT2017:09:34:55       12            
2         b1        12OCT2017:08:55:06       4             
1         a1        12OCT2017:07:59:53       7             
2         b1        12OCT2017:06:59:12       2             
;
run;
proc sort data=have;
by User  ID_option ;
run;
data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash h(hashexp:20);
  h.definekey('user','id_option','time');
  h.definedata('_usage');
  h.definedone();
 end;

do until(last.id_option);
set have;
by user id_option;
 _usage=usage;
 h.ref();
end;

do until(last.id_option);
set have;
by user id_option;
sum_usage_3hr=0;
do i=time-3*3600 to time ;
 if h.find(key:user,key:id_option,key:i)=0 then sum_usage_3hr+_usage;
end;
output;
end;
h.clear();
drop _usage i;
run;
jovic92
Obsidian | Level 7

I think I have enough memory, 32 GB RAM an 8-core processor. 

Just don't get it why is there problem with "No mathing DO/SELECT statement now. 

 

I'm going to try new code. 

Ksharp
Super User

NOTE: If you have duplicated obs , you need run PROC SUMMARY firstly 

i.e.

1 a1 12OCT2017:11:20:32 3
1 a1 12OCT2017:10:23:24 7

1 a1 12OCT2017:10:23:24 2

1 a1 12OCT2017:10:23:24 4

 

 

proc summary data=have;

by  User  ID_option         time ;

var usage;

output out=have1 sum=;

run;

jovic92
Obsidian | Level 7

I have done that, but again is the problem with DO loop control. Same as I mentioned earlier. 

I appreciate your help, either we find the solution or not. 

Ksharp
Super User

It is really weird. I didn't run into any ERROR when running my code . Can you post your LOG ?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 33982 views
  • 10 likes
  • 3 in conversation