I am studying a high frequency tick by tick data. I want to calculate every 5 minutes aggregate volume.
In the data, there is some variables:
1.Date
2.Account
3.Volume
4.Time
In my data, time like 84500.11 is 8:45 or 84716.78 is 8:47. Time is narrow to millisecond.
Here is an example:
Time Volume
84500.11 2
84500.15 3
84700.13 5
84802.36 6
84900.63 5
85000.63 4
So the 5 minutes aggregate volume is 2+3+5+6+5=21(85000.63 is out of 85000).
Trading time is from 8:45 to 13:45. Can you help me to write the code or loop to calculate every 5 minutes aggregate volume?
Thank you for your time.
Highly appreciate it.
Do you mean 500 stands for 5 minutes?
data have;
input date : $20. account volume time;
cards;
2010/1/4 123 1 84500.155
2010/1/4 456 1 84600.116
2010/1/4 789 50 84700.117
2010/1/4 452 5 84800.115
2010/1/4 124 2 84900.115
2010/1/4 477 4 85000.145
2010/1/4 412 1 85100.115
2010/1/4 633 6 85200.123
2010/1/4 521 2 85300.112
2010/1/4 213 27 85400.145
2010/1/4 777 3 85500.144
;
run;
data temp;
set have;
if _n_=1 then set have(obs=1 keep=time rename=(time=_time));
group=int((int(time)-int(_time))/500);
run;
proc means data=temp sum;
by group;
var volume;
run;
Are you using military time? I.e., how do you distinguish between 9AM and 9PM? What happens to your time values when you pass through noon?
More generally, where did you get your time values? I.e. did you produce the numerical time representation you provided us, or did the time values come in some other format that you converted to your demo values?
How are you defining your 5 minutes? And I assume that you have a date and ticker symbol as well, and you want it for each?
If so, post a better sample data set that reflects your problem.
The general solution is to use ROUND() to round the time to the nearest 5 minutes. SAS stores time as seconds, so that would be to the nearest 300. Then you can use PROC MEANS to summarize the data.
Thank you for replying. Very sorry for not expressing my question clearly.
The trading time in this data is from 8:45 to 13:45(5 hours).
The 5 minutes interval is defined as:(84500.000-85000.000), (85000.001-85500.000),
(85500.001-90000.000),......, (134000.001-134500.000)
Here is an example:
date account volume time
2010/1/4 123 1 84500.155
2010/1/4 456 1 84600.116
2010/1/4 789 50 84700.117
2010/1/4 452 5 84800.115
2010/1/4 124 2 84900.115
2010/1/4 477 4 85000.145
2010/1/4 412 1 85100.115
2010/1/4 633 6 85200.123
2010/1/4 521 2 85300.112
2010/1/4 213 27 85400.145
2010/1/4 777 3 85500.144
The first 5 minutes aggregate volume is from 84500.000 to 85000.000. So it is 1+1+50+5+2=59.
And next is from 85000.001 to 85500.000. So it is 4+1+6+2+27=40. There are 300/5=60 intervals.
You can download the data I upload. The data format is csv.
Many thanks for your help.
The trouble with your time format is that you can't directly measure the elapsed time between time points. For instance from 8:55:00 to 9:00:00 should be five minutes (5:00). With values stored as SAS time values (internally stored as seconds after midnight), you could just say elapsed_seconds=time2-time1. And you could find the nearest 5 minutes by rounding (or flooring or ceiling) to the 300 second boundary.
But in your case subtracting 8:55:00 from 9:00:00, you would have 90000.000-85500.00 ==> 04500 which your format would interpret as 45 minutes.
Now if you insist on this format then I would just round up (use the CEIL function), as in
end_time= 500*ceil(time/500);
Then do a PROC MEANS ...; ... BY END_TIME; ...
Do you mean 500 stands for 5 minutes?
data have;
input date : $20. account volume time;
cards;
2010/1/4 123 1 84500.155
2010/1/4 456 1 84600.116
2010/1/4 789 50 84700.117
2010/1/4 452 5 84800.115
2010/1/4 124 2 84900.115
2010/1/4 477 4 85000.145
2010/1/4 412 1 85100.115
2010/1/4 633 6 85200.123
2010/1/4 521 2 85300.112
2010/1/4 213 27 85400.145
2010/1/4 777 3 85500.144
;
run;
data temp;
set have;
if _n_=1 then set have(obs=1 keep=time rename=(time=_time));
group=int((int(time)-int(_time))/500);
run;
proc means data=temp sum;
by group;
var volume;
run;
It works!
Thank you for your coding. Many thanks for your help.
Thank you for your help.
Would you like to give a suggestion how to change the time format about my data for SAS? And If your time is allowed, could you help to write the code because I have little about SAS?
Thank you for your time. Very appreciate it.
Getting a SAS time value from a number whose digits are HHMMSS.mmm, where
HH is a one or two digit representation of hour (00 to 23)
MM is two digit minutes (00-60)
SS is two digit seconds (00-60)
.mmm is a sub-second measure
You can break down this numeric value to provide three arguments to the HMS function to produce a sas time value from hours, minutes, and seconds. See SAS Functions and CALL Routines by Category section in the SAS 9.4 Functions and Call Routines: Reference, Fifth Edition. This will produce a numeric value equal to number of seconds after midnight.
So 09:01:02.345 will have the value 32462.345. But you want to see 09:01:02.345, not 32400. So assign a format (in your case the format time12.3) to the variable.
hh=floor(time/10000); mm=floor(time/100)-100*hh; ss=mod(time,100); newtime=hms(hh,mm,ss); format newtime time12.3;
Having done the above in a SAS data step, later in the same data step you can calculate the nearest 5 minutes by generating
time_end=300*ceil(time/300);
format time_end time8.0;
Look up the CEIL function in the reference above.
Thank you for replying my question. Highly appreciate it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.