Hi there,
have a set of trading data that has observations (timed in milliseconds from midnight) but infrequent. To make it a reasonable timeseries I did a do loop to build a grid (one observation per second should be the outcome). Now I want my dataset to fit into the grid (summing up the volumes that happened in that second. I am very thankful for any ideas. See exampel and faulty code below.
Grid:
Seconds
3300100
3300200
3300300
.
.
Data example:
ID date miliseconds volume
ABC 01/01/15 3300214 100
ABC 01/01/15 3300300 200
ABC 01/01/15 3300415 300
ABC 01/01/15 4400122 400
ABC 01/01/15 4500022 200
ABC 01/01/15 4500111 400
Output:
ID date seconds volume
ABC 01/01/15 3300100 100
ABC 01/01/15 3300200 500
ABC 01/01/15 3300300 400
ABC 01/01/15 3300400 600
(
time is miliseconds from midnught (just a numerical example that is not correct in terms of summing up)
data grid;
do seconds = '09:10't to '16:50't ;
output ;
end ;
run;
proc sql ;
create table want as
select a.id, a.date, a.miliseconds, sum(a.volume) as Volume
from have a
left join
grid1 b on a.id = b.id
and a.date = b.date
and a.miliseconds = floor(b.seconds)
group by a.id, a.date, a.miliseconds ;
quit;
run;
Unfortunately does that not work the way I want. Anything I'm missing? So, ID and date couldn't be found, so I tried to create them in the data step. But then ID isn't the same format, which I tried to fix but failed. It seems I have an error when merging the have dataset to the want by matching the miliseconds between the seconds gridlines. Always thankful for input! Best
I don't understand why these obs should be the same group ? ID date miliseconds volume GROUP ABC 01/01/15 3300214 100 1 ABC 01/01/15 3300300 200 2 ABC 01/01/15 3300415 300 2 ABC 01/01/15 4400122 400 3 ABC 01/01/15 4500022 200 4 ABC 01/01/15 4500111 400 4
Are you sure you're talking milliseconds here, an not "centiseconds"?
My solution:
data have;
input id $ date:mmddyy10. milliseconds volume;
next_full_sec = int((milliseconds-1) / 1000) + 1;
format date mmddyy10. next_full_sec time8.;
cards;
ABC 01/01/15 3300214 100
ABC 01/01/15 3300300 200
ABC 01/01/15 3300415 300
ABC 01/01/15 4400122 400
ABC 01/01/15 4500022 200
ABC 01/01/15 4500111 400
;
run;
proc summary data=have;
by id date next_full_sec;
var volume;
output out=want (drop=_type_) sum=;
run;
Hi Kurt, thank you for the code. It seems to work but it doesn't show the seconds without trades (which should show a 0). Also, I really mean miliseconds from midnight compressed in seconds from midnight, thus it should start at 33000 (seconds from midnight, which is 9:10am).
The rest of your code looks really useful and thank you really much so far!
Then your values are not milliseconds; keep in mind that one second has 1000 milliseconds, so the value for 09:00:00 would be
9 * 3600 * 1000, which evaluates to 32400000. I guess you are missing a zero at the end of your example data's milliseconds. Or they really are centiseconds.
This would be my solution to include the grid:
data have;
input id $ date:mmddyy10. milliseconds volume;
next_full_sec = int((milliseconds-1) / 1000) + 1;
format date mmddyy10. next_full_sec time8.;
cards;
ABC 01/01/15 33002140 100
ABC 01/01/15 33003000 200
ABC 01/01/15 33004150 300
ABC 01/01/15 44001220 400
ABC 01/01/15 45000220 200
ABC 01/01/15 45001110 400
;
run;
data grid;
do seconds = 1 to 86400;
output;
end;
run;
proc summary data=have;
by id date next_full_sec;
var volume;
output out=int (drop=_type_) sum=;
run;
/* create a reference table for all IDs, dates, and seconds */
proc sql;
create table seconds as
select
distinct id, date, seconds
from have, grid
order by id, date, seconds
;
quit;
data want;
merge
seconds (in=a rename=(seconds=next_full_sec))
int (in=b)
;
by id date next_full_sec;
if a;
if not b
then do;
_freq_ = 0;
volume = 0;
end;
run;
proc print noobs;
run;
Kurt, thank you so much for your help so far and it looks like it's nearly done. Now I just get the error:
641 proc sql;
642 create table data.seconds as
643 select id, date, seconds
644 from data.test, data.grid
645 order by id, date, seconds
646 ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
ERROR: User asked for termination.
647 quit;
It's the only thing left to solve I guess. I had to terminate the step after ca. 30min as it wouldn't work apparantly (the data I have is not that big...)
Best
PS: Yes I googled the Error message but SAS help just mentions to apply the second maintenance release for SAS 9.2. I have 9.4 though
Hi!
First of all, you missed the "distinct" in the select. This means that the output dataset would have obs(test) * obs(grid), which probably blows up your space in DATA.
If your datasets are quite large, I recommend to do two steps:
proc sort
data=data.test (keep=id date)
out=data.id_date
nodupkey
;
by id date;
run;
proc sql;
create table data.seconds as
select id, date, seconds
from data.id_date, data.grid
order by id, date, seconds
;
quit;
Now the distinct is not necessary, as that has been dealt with in the first step.
The NOTE about the cartesian joins is expected; it's exactly what we want to do.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.