BookmarkSubscribeRSS Feed
MaBo1011
Calcite | Level 5

Hi there,

 

have a set of trading data that has observations (timed in milliseconds from midnight) but infrequent. To make it a reasonable timeseries I did a do loop to build a grid (one observation per second should be the outcome). Now I want my dataset to fit into the grid (summing up the volumes that happened in that second. I am very thankful for any ideas. See exampel and faulty code below.

 

Grid:

Seconds 
3300100
3300200
3300300
.
.

Data example:

ID date miliseconds volume
ABC 01/01/15 3300214 100
ABC 01/01/15 3300300 200
ABC 01/01/15 3300415 300
ABC 01/01/15 4400122 400
ABC 01/01/15 4500022 200
ABC 01/01/15 4500111 400

Output:

ID date seconds volume
ABC 01/01/15 3300100 100
ABC 01/01/15 3300200 500
ABC 01/01/15 3300300 400
ABC 01/01/15 3300400 600

(

time is miliseconds from midnught (just a numerical example that is not correct in terms of summing up)

 

 

data grid;
do seconds = '09:10't to '16:50't ;
    output ;
  end ;
run;

 proc sql ;
  create table want as
  select a.id, a.date, a.miliseconds, sum(a.volume) as Volume
  from have a
     left join
       grid1 b  on a.id = b.id
                    and a.date = b.date
                    and a.miliseconds = floor(b.seconds)
  group by a.id, a.date, a.miliseconds ;
quit;
run;  

 

Unfortunately does that not work the way I want. Anything I'm missing? So, ID and date couldn't be found, so I tried to create them in the data step. But then ID isn't the same format, which I tried to fix but failed. It seems I have an error when merging the have dataset to the want by matching the miliseconds between the seconds gridlines.  Always thankful for input! Best

9 REPLIES 9
Ksharp
Super User
I don't understand why these obs should be the same group ?

ID date miliseconds volume    GROUP
ABC 01/01/15 3300214 100    1
ABC 01/01/15 3300300 200    2
ABC 01/01/15 3300415 300    2
ABC 01/01/15 4400122 400    3
ABC 01/01/15 4500022 200    4
ABC 01/01/15 4500111 400    4

MaBo1011
Calcite | Level 5
They should be summed up to the next full second
Kurt_Bremser
Super User

Are you sure you're talking milliseconds here, an not "centiseconds"?

My solution:

data have;
input id $ date:mmddyy10. milliseconds volume;
next_full_sec = int((milliseconds-1) / 1000) + 1;
format date mmddyy10. next_full_sec time8.;
cards;
ABC 01/01/15 3300214 100
ABC 01/01/15 3300300 200
ABC 01/01/15 3300415 300
ABC 01/01/15 4400122 400
ABC 01/01/15 4500022 200
ABC 01/01/15 4500111 400
;
run;

proc summary data=have;
by id date next_full_sec;
var volume;
output out=want (drop=_type_) sum=;
run;
MaBo1011
Calcite | Level 5

Hi Kurt, thank you for the code. It seems to work but it doesn't show the seconds without trades (which should show a 0). Also, I really mean miliseconds from midnight compressed in seconds from midnight, thus it should start at 33000 (seconds from midnight, which is 9:10am).

The rest of your code looks really useful and thank you really much so far!

Kurt_Bremser
Super User

Then your values are not milliseconds; keep in mind that one second has 1000 milliseconds, so the value for 09:00:00 would be

9 * 3600 * 1000, which evaluates to 32400000. I guess you are missing a zero at the end of your example data's milliseconds. Or they really are centiseconds.

MaBo1011
Calcite | Level 5
You're right. Still, I want my given millisecond values be summed up onto the full second grid (starting 33000).
Kurt_Bremser
Super User

This would be my solution to include the grid:

data have;
input id $ date:mmddyy10. milliseconds volume;
next_full_sec = int((milliseconds-1) / 1000) + 1;
format date mmddyy10. next_full_sec time8.;
cards;
ABC 01/01/15 33002140 100
ABC 01/01/15 33003000 200
ABC 01/01/15 33004150 300
ABC 01/01/15 44001220 400
ABC 01/01/15 45000220 200
ABC 01/01/15 45001110 400
;
run;

data grid;
do seconds = 1 to 86400;
  output;
end;
run;

proc summary data=have;
by id date next_full_sec;
var volume;
output out=int (drop=_type_) sum=;
run;

/* create a reference table for all IDs, dates, and seconds */ proc sql; create table seconds as select distinct id, date, seconds from have, grid order by id, date, seconds ; quit; data want; merge seconds (in=a rename=(seconds=next_full_sec)) int (in=b) ; by id date next_full_sec; if a; if not b then do; _freq_ = 0; volume = 0; end; run; proc print noobs; run;
MaBo1011
Calcite | Level 5

Kurt, thank you so much for your help so far and it looks like it's nearly done. Now I just get the error:

 

641 proc sql;
642 create table data.seconds as
643 select id, date, seconds
644 from data.test, data.grid
645 order by id, date, seconds
646 ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
ERROR: User asked for termination.

 

647 quit;

 

 

It's the only thing left to solve I guess. I had to terminate the step after ca. 30min as it wouldn't work apparantly (the data I have is not that big...)

 

Best

 

PS: Yes I googled the Error message but SAS help just mentions to apply the second maintenance release for SAS 9.2. I have 9.4 though

Kurt_Bremser
Super User

Hi!

 

First of all, you missed the "distinct" in the select. This means that the output dataset would have obs(test) * obs(grid), which probably blows up your space in DATA.

If your datasets are quite large, I recommend to do two steps:

proc sort
  data=data.test (keep=id date)
  out=data.id_date
  nodupkey
;
by id date;
run;

proc sql;
create table data.seconds as
select id, date, seconds
from data.id_date, data.grid
order by id, date, seconds
;
quit;

Now the distinct is not necessary, as that has been dealt with in the first step.

The NOTE about the cartesian joins is expected; it's exactly what we want to do.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1978 views
  • 0 likes
  • 3 in conversation