BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dfgbnm
Fluorite | Level 6

I am studying a high frequency tick by tick data. I want to calculate every 5 minutes aggregate volume.

In the data, there is some variables:

1.Date

2.Account

3.Volume

4.Time

In my data, time like 84500.11 is 8:45 or 84716.78 is 8:47. Time is narrow to millisecond.

 

Here is an example:

Time           Volume

84500.11        2

84500.15        3

84700.13        5

84802.36        6

84900.63        5

85000.63        4

 

So the 5 minutes aggregate volume is 2+3+5+6+5=21(85000.63 is out of 85000).

 

Trading time is from 8:45 to 13:45. Can you help me to write the code or loop to calculate every 5 minutes aggregate volume?

 

Thank you for your time. 
Highly appreciate it.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Do you mean 500 stands for 5 minutes?

data have;
input date : $20.         account      volume         time;
cards;
2010/1/4           123            1           84500.155
2010/1/4           456            1           84600.116      
2010/1/4           789           50          84700.117
2010/1/4           452            5           84800.115
2010/1/4           124            2           84900.115
2010/1/4           477            4           85000.145
2010/1/4           412            1           85100.115
2010/1/4           633            6           85200.123
2010/1/4           521            2           85300.112
2010/1/4           213           27          85400.145
2010/1/4           777            3           85500.144
;
run;
data temp;
 set have;
 if _n_=1 then set have(obs=1 keep=time rename=(time=_time));
 group=int((int(time)-int(_time))/500);
run;
proc means data=temp sum;
by group;
var volume;
run;

View solution in original post

9 REPLIES 9
mkeintz
PROC Star

Are you using military time?  I.e., how do you distinguish between 9AM and 9PM?  What happens to your time values when you pass through noon?

 

More generally, where did you get your time values?  I.e. did you produce the numerical time representation you provided us, or did the time values come in some other format that you converted to your demo values?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

How are you defining your 5 minutes? And I assume that you have a date and ticker symbol as well, and you want it for each?

If so, post a better sample data set that reflects your problem.

 

The general solution is to use ROUND() to round the time to the nearest 5 minutes. SAS stores time as seconds, so that would be to the nearest 300.  Then you can use PROC MEANS to summarize the data. 

dfgbnm
Fluorite | Level 6

Thank you for replying. Very sorry for not expressing my question clearly.

The trading time in this data is from 8:45 to 13:45(5 hours).

The 5 minutes interval is defined as:(84500.000-85000.000), (85000.001-85500.000),

(85500.001-90000.000),......, (134000.001-134500.000)

 

Here is an example:

date             account      volume         time
2010/1/4           123            1           84500.155

2010/1/4           456            1           84600.116      

2010/1/4           789           50          84700.117

2010/1/4           452            5           84800.115

2010/1/4           124            2           84900.115

2010/1/4           477            4           85000.145

2010/1/4           412            1           85100.115

2010/1/4           633            6           85200.123

2010/1/4           521            2           85300.112

2010/1/4           213           27          85400.145

2010/1/4           777            3           85500.144

The first 5 minutes aggregate volume is from 84500.000 to 85000.000. So it is 1+1+50+5+2=59.

And next is from 85000.001 to 85500.000. So it is 4+1+6+2+27=40. There are 300/5=60 intervals.

 

You can download the data I upload. The data format is csv.

Many thanks for your help.

mkeintz
PROC Star

The trouble with your time format is that you can't directly measure the elapsed time between time points.  For instance from 8:55:00 to 9:00:00 should be five minutes (5:00).  With values stored as SAS time values (internally stored as seconds after midnight), you could just say  elapsed_seconds=time2-time1.  And you could find the nearest 5 minutes by rounding (or flooring or ceiling) to the 300 second boundary.

 

But in your case subtracting 8:55:00 from 9:00:00, you would have 90000.000-85500.00 ==>  04500 which your format would interpret as 45 minutes.

 

Now if you insist on this format then I would just round up  (use the CEIL function), as in

    end_time=   500*ceil(time/500);

 

Then do a  PROC MEANS ...;  ... BY END_TIME; ...

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

Do you mean 500 stands for 5 minutes?

data have;
input date : $20.         account      volume         time;
cards;
2010/1/4           123            1           84500.155
2010/1/4           456            1           84600.116      
2010/1/4           789           50          84700.117
2010/1/4           452            5           84800.115
2010/1/4           124            2           84900.115
2010/1/4           477            4           85000.145
2010/1/4           412            1           85100.115
2010/1/4           633            6           85200.123
2010/1/4           521            2           85300.112
2010/1/4           213           27          85400.145
2010/1/4           777            3           85500.144
;
run;
data temp;
 set have;
 if _n_=1 then set have(obs=1 keep=time rename=(time=_time));
 group=int((int(time)-int(_time))/500);
run;
proc means data=temp sum;
by group;
var volume;
run;
dfgbnm
Fluorite | Level 6

It works!

Thank you for your coding. Many thanks for your help.

dfgbnm
Fluorite | Level 6

Thank you for your help. 

Would you like to give a suggestion how to change the time format about my data for SAS? And If your time is allowed, could you help to write the code because I have little about SAS?

 

Thank you for your time. Very appreciate it.

mkeintz
PROC Star

Getting a SAS time value from a number whose digits are  HHMMSS.mmm, where

    HH is a one or two digit representation of hour (00 to 23)

    MM is two digit minutes (00-60)

    SS is two digit seconds (00-60)

    .mmm is a sub-second measure

 

You can break down this numeric value to provide three arguments to the HMS function to produce a sas time value from hours, minutes, and seconds.  See SAS Functions and CALL Routines by Category section in the SAS 9.4 Functions and Call Routines: Reference, Fifth Edition.  This will produce a numeric value equal to number of seconds after midnight.  

 

So 09:01:02.345 will have the value 32462.345.  But you want to see 09:01:02.345, not 32400.  So assign a format (in your case the format time12.3) to the variable.

 

  

hh=floor(time/10000);
mm=floor(time/100)-100*hh;
ss=mod(time,100);

newtime=hms(hh,mm,ss);
format newtime  time12.3;

 

Having done the above in a SAS data step, later in the same data step you can calculate the nearest 5 minutes by generating

  

  time_end=300*ceil(time/300);  
format time_end time8.0;

 

Look up the CEIL function in the reference above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
dfgbnm
Fluorite | Level 6

Thank you for replying my question. Highly appreciate it.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2526 views
  • 8 likes
  • 4 in conversation