Hi, everyone,
What is needed is to calculate for every record (every row) the last 3hr sum of usage (Usage is one of the columns in dataset) grouped by User and ID_option.
Every line(row) represent one record (within 3 min time interval). For example (including desired column sum_usage):
User ID_option time usage sum_usage_3hr 1 a1 12OCT2017:11:20:32 3 10 1 a1 12OCT2017:10:23:24 7 14 1 b1 12OCT2017:09:34:55 12 12 2 b1 12OCT2017:08:55:06 4 6 1 a1 12OCT2017:07:59:53 7 7 2 b1 12OCT2017:06:59:12 2 2
I have tried with something like this code below, and it return me sum of all time, not just the last 3HR. I'm not surprised, but have not so much idea how I'm going to do this in sas.
proc sql:
CREATE table my_table
SELECT *, SUM(usage) AS sum_usage_3hr
FROM prev_table WHERE time BETWEEN TIME and intnx('second', time, -3*3600)
GROUP BY User, ID_option;
RUN;
Any help is welcomed, thanks.
OK. Here is . Hope you have big enough memory to run Hash Table.
data have;
input User ID_option $ time : datetime32. usage ;
format time datetime32. ;
cards;
1 a1 12OCT2017:11:20:32 3
1 a1 12OCT2017:10:23:24 7
1 b1 12OCT2017:09:34:55 12
2 b1 12OCT2017:08:55:06 4
1 a1 12OCT2017:07:59:53 7
2 b1 12OCT2017:06:59:12 2
;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(usage=_usage));
declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
h.definekey('user','id_option','time');
h.definedata('_usage');
h.definedone();
end;
set have;
sum_usage_3hr=0;
do i=time-3*3600 to time ;
if h.find(key:user,key:id_option,key:i)=0 then sum_usage_3hr+_usage;
end;
drop _usage i;
run;
data have;
input User ID_option $ time : datetime32. usage ;
format time datetime32. ;
cards;
1 a1 12OCT2017:11:20:32 3
1 a1 12OCT2017:10:23:24 7
1 b1 12OCT2017:09:34:55 12
2 b1 12OCT2017:08:55:06 4
1 a1 12OCT2017:07:59:53 7
2 b1 12OCT2017:06:59:12 2
;
run;
proc sql;
select *,(select sum(usage) from have as b where b.user=a.user and b.ID_option=a.ID_option and
b.time between a.time-3*3600 and a.time ) as want
from have as a;
quit;
I'm testing this code now, but it works about 1 hour, and still counting, I don't know why, it looks good.
Maybe there is need for some sorting data before aggregation?
Thank you for your reply.
Do you have a big table ? If you have, then I suggest to split your big table into many small table, each contains only ONE id . and run my code . Or maybe you need other fast algorithm like Hash Table .
Yes, it is a big table, but I think it should be done till now. I will try that, thanks.
Now I've made table a smaller, I've take just 5 of 24 loader files and still have the same problem.
OK. Here is . Hope you have big enough memory to run Hash Table.
data have;
input User ID_option $ time : datetime32. usage ;
format time datetime32. ;
cards;
1 a1 12OCT2017:11:20:32 3
1 a1 12OCT2017:10:23:24 7
1 b1 12OCT2017:09:34:55 12
2 b1 12OCT2017:08:55:06 4
1 a1 12OCT2017:07:59:53 7
2 b1 12OCT2017:06:59:12 2
;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(usage=_usage));
declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
h.definekey('user','id_option','time');
h.definedata('_usage');
h.definedone();
end;
set have;
sum_usage_3hr=0;
do i=time-3*3600 to time ;
if h.find(key:user,key:id_option,key:i)=0 then sum_usage_3hr+_usage;
end;
drop _usage i;
run;
I have got an error:
Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid.
And don't forget round your TIME variable, if it has decimal number.
data have;
input User ID_option $ time : datetime32. usage ;
time=int(time);
format time datetime32. ;
Time is in format I have posted in question.
Thank you, I'm going to try that in a few minutes.
If you don't have big memory to run Hash Table, Try this one .
NOTE: these code is assuming there are not duplicate TIME for the same user and id_option.
If there duplicated TIME(e.g. two or more obs have the same TIME under the same user and same id_option),
use PROC SUMMARY to sum USAGE.
data have;
input User ID_option $ time : datetime32. usage ;
format time datetime32. ;
cards;
1 a1 12OCT2017:11:20:32 3
1 a1 12OCT2017:10:23:24 7
1 b1 12OCT2017:09:34:55 12
2 b1 12OCT2017:08:55:06 4
1 a1 12OCT2017:07:59:53 7
2 b1 12OCT2017:06:59:12 2
;
run;
proc sort data=have;
by User ID_option ;
run;
data want;
if _n_=1 then do;
if 0 then set have;
declare hash h(hashexp:20);
h.definekey('user','id_option','time');
h.definedata('_usage');
h.definedone();
end;
do until(last.id_option);
set have;
by user id_option;
_usage=usage;
h.ref();
end;
do until(last.id_option);
set have;
by user id_option;
sum_usage_3hr=0;
do i=time-3*3600 to time ;
if h.find(key:user,key:id_option,key:i)=0 then sum_usage_3hr+_usage;
end;
output;
end;
h.clear();
drop _usage i;
run;
I think I have enough memory, 32 GB RAM an 8-core processor.
Just don't get it why is there problem with "No mathing DO/SELECT statement now.
I'm going to try new code.
NOTE: If you have duplicated obs , you need run PROC SUMMARY firstly
i.e.
1 a1 12OCT2017:11:20:32 3
1 a1 12OCT2017:10:23:24 7
1 a1 12OCT2017:10:23:24 2
1 a1 12OCT2017:10:23:24 4
proc summary data=have;
by User ID_option time ;
var usage;
output out=have1 sum=;
run;
I have done that, but again is the problem with DO loop control. Same as I mentioned earlier.
I appreciate your help, either we find the solution or not.
It is really weird. I didn't run into any ERROR when running my code . Can you post your LOG ?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.