Hi there,
have a set of trading data that has observations (timed in milliseconds from midnight) but infrequent. To make it a reasonable timeseries I did a do loop to build a grid (one observation per second should be the outcome). Now I want my dataset to fit into the grid (summing up the volumes that happened in that second. I am very thankful for any ideas. See exampel and faulty code below.
Grid:
Seconds
3300100
3300200
3300300
.
.
Data example:
ID date miliseconds volume
ABC 01/01/15 3300214 100
ABC 01/01/15 3300300 200
ABC 01/01/15 3300415 300
ABC 01/01/15 4400122 400
ABC 01/01/15 4500022 200
ABC 01/01/15 4500111 400
Output:
ID date seconds volume
ABC 01/01/15 3300100 100
ABC 01/01/15 3300200 500
ABC 01/01/15 3300300 400
ABC 01/01/15 3300400 600
(
time is miliseconds from midnught (just a numerical example that is not correct in terms of summing up)
data grid;
do seconds = '09:10't to '16:50't ;
output ;
end ;
run;
proc sql ;
create table want as
select a.id, a.date, a.miliseconds, sum(a.volume) as Volume
from have a
left join
grid1 b on a.id = b.id
and a.date = b.date
and a.miliseconds = floor(b.seconds)
group by a.id, a.date, a.miliseconds ;
quit;
run;
Unfortunately does that not work the way I want. Anything I'm missing? So, ID and date couldn't be found, so I tried to create them in the data step. But then ID isn't the same format, which I tried to fix but failed. It seems I have an error when merging the have dataset to the want by matching the miliseconds between the seconds gridlines. Always thankful for input! Best
I don't understand why these obs should be the same group ? ID date miliseconds volume GROUP ABC 01/01/15 3300214 100 1 ABC 01/01/15 3300300 200 2 ABC 01/01/15 3300415 300 2 ABC 01/01/15 4400122 400 3 ABC 01/01/15 4500022 200 4 ABC 01/01/15 4500111 400 4
Are you sure you're talking milliseconds here, an not "centiseconds"?
My solution:
data have;
input id $ date:mmddyy10. milliseconds volume;
next_full_sec = int((milliseconds-1) / 1000) + 1;
format date mmddyy10. next_full_sec time8.;
cards;
ABC 01/01/15 3300214 100
ABC 01/01/15 3300300 200
ABC 01/01/15 3300415 300
ABC 01/01/15 4400122 400
ABC 01/01/15 4500022 200
ABC 01/01/15 4500111 400
;
run;
proc summary data=have;
by id date next_full_sec;
var volume;
output out=want (drop=_type_) sum=;
run;
Hi Kurt, thank you for the code. It seems to work but it doesn't show the seconds without trades (which should show a 0). Also, I really mean miliseconds from midnight compressed in seconds from midnight, thus it should start at 33000 (seconds from midnight, which is 9:10am).
The rest of your code looks really useful and thank you really much so far!
Then your values are not milliseconds; keep in mind that one second has 1000 milliseconds, so the value for 09:00:00 would be
9 * 3600 * 1000, which evaluates to 32400000. I guess you are missing a zero at the end of your example data's milliseconds. Or they really are centiseconds.
This would be my solution to include the grid:
data have;
input id $ date:mmddyy10. milliseconds volume;
next_full_sec = int((milliseconds-1) / 1000) + 1;
format date mmddyy10. next_full_sec time8.;
cards;
ABC 01/01/15 33002140 100
ABC 01/01/15 33003000 200
ABC 01/01/15 33004150 300
ABC 01/01/15 44001220 400
ABC 01/01/15 45000220 200
ABC 01/01/15 45001110 400
;
run;
data grid;
do seconds = 1 to 86400;
output;
end;
run;
proc summary data=have;
by id date next_full_sec;
var volume;
output out=int (drop=_type_) sum=;
run;
/* create a reference table for all IDs, dates, and seconds */
proc sql;
create table seconds as
select
distinct id, date, seconds
from have, grid
order by id, date, seconds
;
quit;
data want;
merge
seconds (in=a rename=(seconds=next_full_sec))
int (in=b)
;
by id date next_full_sec;
if a;
if not b
then do;
_freq_ = 0;
volume = 0;
end;
run;
proc print noobs;
run;
Kurt, thank you so much for your help so far and it looks like it's nearly done. Now I just get the error:
641 proc sql;
642 create table data.seconds as
643 select id, date, seconds
644 from data.test, data.grid
645 order by id, date, seconds
646 ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
ERROR: User asked for termination.
647 quit;
It's the only thing left to solve I guess. I had to terminate the step after ca. 30min as it wouldn't work apparantly (the data I have is not that big...)
Best
PS: Yes I googled the Error message but SAS help just mentions to apply the second maintenance release for SAS 9.2. I have 9.4 though
Hi!
First of all, you missed the "distinct" in the select. This means that the output dataset would have obs(test) * obs(grid), which probably blows up your space in DATA.
If your datasets are quite large, I recommend to do two steps:
proc sort
data=data.test (keep=id date)
out=data.id_date
nodupkey
;
by id date;
run;
proc sql;
create table data.seconds as
select id, date, seconds
from data.id_date, data.grid
order by id, date, seconds
;
quit;
Now the distinct is not necessary, as that has been dealt with in the first step.
The NOTE about the cartesian joins is expected; it's exactly what we want to do.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.