08-11-2016 02:38 AM
have a set of trading data that has observations (timed in milliseconds from midnight) but infrequent. To make it a reasonable timeseries I did a do loop to build a grid (one observation per second should be the outcome). Now I want my dataset to fit into the grid (summing up the volumes that happened in that second. I am very thankful for any ideas. See exampel and faulty code below.
ID date miliseconds volume ABC 01/01/15 3300214 100 ABC 01/01/15 3300300 200 ABC 01/01/15 3300415 300 ABC 01/01/15 4400122 400 ABC 01/01/15 4500022 200 ABC 01/01/15 4500111 400
ID date seconds volume ABC 01/01/15 3300100 100 ABC 01/01/15 3300200 500 ABC 01/01/15 3300300 400 ABC 01/01/15 3300400 600
time is miliseconds from midnught (just a numerical example that is not correct in terms of summing up)
data grid; do seconds = '09:10't to '16:50't ; output ; end ; run; proc sql ; create table want as select a.id, a.date, a.miliseconds, sum(a.volume) as Volume from have a left join grid1 b on a.id = b.id and a.date = b.date and a.miliseconds = floor(b.seconds) group by a.id, a.date, a.miliseconds ; quit; run;
Unfortunately does that not work the way I want. Anything I'm missing? So, ID and date couldn't be found, so I tried to create them in the data step. But then ID isn't the same format, which I tried to fix but failed. It seems I have an error when merging the have dataset to the want by matching the miliseconds between the seconds gridlines. Always thankful for input! Best
08-11-2016 03:17 AM
I don't understand why these obs should be the same group ? ID date miliseconds volume GROUP ABC 01/01/15 3300214 100 1 ABC 01/01/15 3300300 200 2 ABC 01/01/15 3300415 300 2 ABC 01/01/15 4400122 400 3 ABC 01/01/15 4500022 200 4 ABC 01/01/15 4500111 400 4
08-11-2016 03:34 AM
Are you sure you're talking milliseconds here, an not "centiseconds"?
data have; input id $ date:mmddyy10. milliseconds volume; next_full_sec = int((milliseconds-1) / 1000) + 1; format date mmddyy10. next_full_sec time8.; cards; ABC 01/01/15 3300214 100 ABC 01/01/15 3300300 200 ABC 01/01/15 3300415 300 ABC 01/01/15 4400122 400 ABC 01/01/15 4500022 200 ABC 01/01/15 4500111 400 ; run; proc summary data=have; by id date next_full_sec; var volume; output out=want (drop=_type_) sum=; run;
08-11-2016 03:58 AM
Hi Kurt, thank you for the code. It seems to work but it doesn't show the seconds without trades (which should show a 0). Also, I really mean miliseconds from midnight compressed in seconds from midnight, thus it should start at 33000 (seconds from midnight, which is 9:10am).
The rest of your code looks really useful and thank you really much so far!
08-11-2016 04:05 AM
Then your values are not milliseconds; keep in mind that one second has 1000 milliseconds, so the value for 09:00:00 would be
9 * 3600 * 1000, which evaluates to 32400000. I guess you are missing a zero at the end of your example data's milliseconds. Or they really are centiseconds.
08-11-2016 06:10 AM
This would be my solution to include the grid:
data have; input id $ date:mmddyy10. milliseconds volume; next_full_sec = int((milliseconds-1) / 1000) + 1; format date mmddyy10. next_full_sec time8.; cards; ABC 01/01/15 33002140 100 ABC 01/01/15 33003000 200 ABC 01/01/15 33004150 300 ABC 01/01/15 44001220 400 ABC 01/01/15 45000220 200 ABC 01/01/15 45001110 400 ; run; data grid; do seconds = 1 to 86400; output; end; run; proc summary data=have; by id date next_full_sec; var volume; output out=int (drop=_type_) sum=; run;
/* create a reference table for all IDs, dates, and seconds */ proc sql; create table seconds as select distinct id, date, seconds from have, grid order by id, date, seconds ; quit; data want; merge seconds (in=a rename=(seconds=next_full_sec)) int (in=b) ; by id date next_full_sec; if a; if not b then do; _freq_ = 0; volume = 0; end; run; proc print noobs; run;
08-11-2016 06:55 PM - edited 08-11-2016 06:59 PM
Kurt, thank you so much for your help so far and it looks like it's nearly done. Now I just get the error:
641 proc sql;
642 create table data.seconds as
643 select id, date, seconds
644 from data.test, data.grid
645 order by id, date, seconds
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
ERROR: User asked for termination.
It's the only thing left to solve I guess. I had to terminate the step after ca. 30min as it wouldn't work apparantly (the data I have is not that big...)
PS: Yes I googled the Error message but SAS help just mentions to apply the second maintenance release for SAS 9.2. I have 9.4 though
08-12-2016 02:23 AM
First of all, you missed the "distinct" in the select. This means that the output dataset would have obs(test) * obs(grid), which probably blows up your space in DATA.
If your datasets are quite large, I recommend to do two steps:
proc sort data=data.test (keep=id date) out=data.id_date nodupkey ; by id date; run; proc sql; create table data.seconds as select id, date, seconds from data.id_date, data.grid order by id, date, seconds ; quit;
Now the distinct is not necessary, as that has been dealt with in the first step.
The NOTE about the cartesian joins is expected; it's exactly what we want to do.