have a set of trading data that has observations (timed in milliseconds from midnight) but infrequent. To make it a reasonable timeseries I did a do loop to build a grid (one observation per second should be the outcome). Now I want my dataset to fit into the grid (summing up the volumes that happened in that second. I am very thankful for any ideas. See exampel and faulty code below.
ID date miliseconds volume ABC 01/01/15 3300214 100 ABC 01/01/15 3300300 200 ABC 01/01/15 3300415 300 ABC 01/01/15 4400122 400 ABC 01/01/15 4500022 200 ABC 01/01/15 4500111 400
ID date seconds volume ABC 01/01/15 3300100 100 ABC 01/01/15 3300200 500 ABC 01/01/15 3300300 400 ABC 01/01/15 3300400 600
time is miliseconds from midnught (just a numerical example that is not correct in terms of summing up)
data grid; do seconds = '09:10't to '16:50't ; output ; end ; run; proc sql ; create table want as select a.id, a.date, a.miliseconds, sum(a.volume) as Volume from have a left join grid1 b on a.id = b.id and a.date = b.date and a.miliseconds = floor(b.seconds) group by a.id, a.date, a.miliseconds ; quit; run;
Unfortunately does that not work the way I want. Anything I'm missing? So, ID and date couldn't be found, so I tried to create them in the data step. But then ID isn't the same format, which I tried to fix but failed. It seems I have an error when merging the have dataset to the want by matching the miliseconds between the seconds gridlines. Always thankful for input! Best
I don't understand why these obs should be the same group ? ID date miliseconds volume GROUP ABC 01/01/15 3300214 100 1 ABC 01/01/15 3300300 200 2 ABC 01/01/15 3300415 300 2 ABC 01/01/15 4400122 400 3 ABC 01/01/15 4500022 200 4 ABC 01/01/15 4500111 400 4
Are you sure you're talking milliseconds here, an not "centiseconds"?
data have; input id $ date:mmddyy10. milliseconds volume; next_full_sec = int((milliseconds-1) / 1000) + 1; format date mmddyy10. next_full_sec time8.; cards; ABC 01/01/15 3300214 100 ABC 01/01/15 3300300 200 ABC 01/01/15 3300415 300 ABC 01/01/15 4400122 400 ABC 01/01/15 4500022 200 ABC 01/01/15 4500111 400 ; run; proc summary data=have; by id date next_full_sec; var volume; output out=want (drop=_type_) sum=; run;
Hi Kurt, thank you for the code. It seems to work but it doesn't show the seconds without trades (which should show a 0). Also, I really mean miliseconds from midnight compressed in seconds from midnight, thus it should start at 33000 (seconds from midnight, which is 9:10am).
The rest of your code looks really useful and thank you really much so far!
Then your values are not milliseconds; keep in mind that one second has 1000 milliseconds, so the value for 09:00:00 would be
9 * 3600 * 1000, which evaluates to 32400000. I guess you are missing a zero at the end of your example data's milliseconds. Or they really are centiseconds.
This would be my solution to include the grid:
data have; input id $ date:mmddyy10. milliseconds volume; next_full_sec = int((milliseconds-1) / 1000) + 1; format date mmddyy10. next_full_sec time8.; cards; ABC 01/01/15 33002140 100 ABC 01/01/15 33003000 200 ABC 01/01/15 33004150 300 ABC 01/01/15 44001220 400 ABC 01/01/15 45000220 200 ABC 01/01/15 45001110 400 ; run; data grid; do seconds = 1 to 86400; output; end; run; proc summary data=have; by id date next_full_sec; var volume; output out=int (drop=_type_) sum=; run;
/* create a reference table for all IDs, dates, and seconds */ proc sql; create table seconds as select distinct id, date, seconds from have, grid order by id, date, seconds ; quit; data want; merge seconds (in=a rename=(seconds=next_full_sec)) int (in=b) ; by id date next_full_sec; if a; if not b then do; _freq_ = 0; volume = 0; end; run; proc print noobs; run;
Kurt, thank you so much for your help so far and it looks like it's nearly done. Now I just get the error:
641 proc sql;
642 create table data.seconds as
643 select id, date, seconds
644 from data.test, data.grid
645 order by id, date, seconds
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
ERROR: User asked for termination.
It's the only thing left to solve I guess. I had to terminate the step after ca. 30min as it wouldn't work apparantly (the data I have is not that big...)
PS: Yes I googled the Error message but SAS help just mentions to apply the second maintenance release for SAS 9.2. I have 9.4 though
First of all, you missed the "distinct" in the select. This means that the output dataset would have obs(test) * obs(grid), which probably blows up your space in DATA.
If your datasets are quite large, I recommend to do two steps:
proc sort data=data.test (keep=id date) out=data.id_date nodupkey ; by id date; run; proc sql; create table data.seconds as select id, date, seconds from data.id_date, data.grid order by id, date, seconds ; quit;
Now the distinct is not necessary, as that has been dealt with in the first step.
The NOTE about the cartesian joins is expected; it's exactly what we want to do.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.