Help using Base SAS procedures

Collating/summarizing observations over a time interval

Reply
Occasional Contributor
Posts: 7

Collating/summarizing observations over a time interval

Hi there,

 

have a set of trading data that has observations (timed in milliseconds from midnight) but infrequent. To make it a reasonable timeseries I did a do loop to build a grid (one observation per second should be the outcome). Now I want my dataset to fit into the grid (summing up the volumes that happened in that second. I am very thankful for any ideas. See exampel and faulty code below.

 

Grid:

Seconds 
3300100
3300200
3300300
.
.

Data example:

ID date miliseconds volume
ABC 01/01/15 3300214 100
ABC 01/01/15 3300300 200
ABC 01/01/15 3300415 300
ABC 01/01/15 4400122 400
ABC 01/01/15 4500022 200
ABC 01/01/15 4500111 400

Output:

ID date seconds volume
ABC 01/01/15 3300100 100
ABC 01/01/15 3300200 500
ABC 01/01/15 3300300 400
ABC 01/01/15 3300400 600

(

time is miliseconds from midnught (just a numerical example that is not correct in terms of summing up)

 

 

data grid;
do seconds = '09:10't to '16:50't ;
    output ;
  end ;
run;

 proc sql ;
  create table want as
  select a.id, a.date, a.miliseconds, sum(a.volume) as Volume
  from have a
     left join
       grid1 b  on a.id = b.id
                    and a.date = b.date
                    and a.miliseconds = floor(b.seconds)
  group by a.id, a.date, a.miliseconds ;
quit;
run;  

 

Unfortunately does that not work the way I want. Anything I'm missing? So, ID and date couldn't be found, so I tried to create them in the data step. But then ID isn't the same format, which I tried to fix but failed. It seems I have an error when merging the have dataset to the want by matching the miliseconds between the seconds gridlines.  Always thankful for input! Best

Super User
Posts: 9,682

Re: Collating/summarizing observations over a time interval

I don't understand why these obs should be the same group ?

ID date miliseconds volume    GROUP
ABC 01/01/15 3300214 100    1
ABC 01/01/15 3300300 200    2
ABC 01/01/15 3300415 300    2
ABC 01/01/15 4400122 400    3
ABC 01/01/15 4500022 200    4
ABC 01/01/15 4500111 400    4

Occasional Contributor
Posts: 7

Re: Collating/summarizing observations over a time interval

They should be summed up to the next full second
Super User
Posts: 6,948

Re: Collating/summarizing observations over a time interval

Are you sure you're talking milliseconds here, an not "centiseconds"?

My solution:

data have;
input id $ date:mmddyy10. milliseconds volume;
next_full_sec = int((milliseconds-1) / 1000) + 1;
format date mmddyy10. next_full_sec time8.;
cards;
ABC 01/01/15 3300214 100
ABC 01/01/15 3300300 200
ABC 01/01/15 3300415 300
ABC 01/01/15 4400122 400
ABC 01/01/15 4500022 200
ABC 01/01/15 4500111 400
;
run;

proc summary data=have;
by id date next_full_sec;
var volume;
output out=want (drop=_type_) sum=;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 7

Re: Collating/summarizing observations over a time interval

Hi Kurt, thank you for the code. It seems to work but it doesn't show the seconds without trades (which should show a 0). Also, I really mean miliseconds from midnight compressed in seconds from midnight, thus it should start at 33000 (seconds from midnight, which is 9:10am).

The rest of your code looks really useful and thank you really much so far!

Super User
Posts: 6,948

Re: Collating/summarizing observations over a time interval

Then your values are not milliseconds; keep in mind that one second has 1000 milliseconds, so the value for 09:00:00 would be

9 * 3600 * 1000, which evaluates to 32400000. I guess you are missing a zero at the end of your example data's milliseconds. Or they really are centiseconds.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 7

Re: Collating/summarizing observations over a time interval

You're right. Still, I want my given millisecond values be summed up onto the full second grid (starting 33000).
Super User
Posts: 6,948

Re: Collating/summarizing observations over a time interval

This would be my solution to include the grid:

data have;
input id $ date:mmddyy10. milliseconds volume;
next_full_sec = int((milliseconds-1) / 1000) + 1;
format date mmddyy10. next_full_sec time8.;
cards;
ABC 01/01/15 33002140 100
ABC 01/01/15 33003000 200
ABC 01/01/15 33004150 300
ABC 01/01/15 44001220 400
ABC 01/01/15 45000220 200
ABC 01/01/15 45001110 400
;
run;

data grid;
do seconds = 1 to 86400;
  output;
end;
run;

proc summary data=have;
by id date next_full_sec;
var volume;
output out=int (drop=_type_) sum=;
run;

/* create a reference table for all IDs, dates, and seconds */ proc sql; create table seconds as select distinct id, date, seconds from have, grid order by id, date, seconds ; quit; data want; merge seconds (in=a rename=(seconds=next_full_sec)) int (in=b) ; by id date next_full_sec; if a; if not b then do; _freq_ = 0; volume = 0; end; run; proc print noobs; run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 7

Re: Collating/summarizing observations over a time interval

[ Edited ]

Kurt, thank you so much for your help so far and it looks like it's nearly done. Now I just get the error:

 

641 proc sql;
642 create table data.seconds as
643 select id, date, seconds
644 from data.test, data.grid
645 order by id, date, seconds
646 ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
ERROR: User asked for termination.

 

647 quit;

 

 

It's the only thing left to solve I guess. I had to terminate the step after ca. 30min as it wouldn't work apparantly (the data I have is not that big...)

 

Best

 

PS: Yes I googled the Error message but SAS help just mentions to apply the second maintenance release for SAS 9.2. I have 9.4 though

Super User
Posts: 6,948

Re: Collating/summarizing observations over a time interval

Hi!

 

First of all, you missed the "distinct" in the select. This means that the output dataset would have obs(test) * obs(grid), which probably blows up your space in DATA.

If your datasets are quite large, I recommend to do two steps:

proc sort
  data=data.test (keep=id date)
  out=data.id_date
  nodupkey
;
by id date;
run;

proc sql;
create table data.seconds as
select id, date, seconds
from data.id_date, data.grid
order by id, date, seconds
;
quit;

Now the distinct is not necessary, as that has been dealt with in the first step.

The NOTE about the cartesian joins is expected; it's exactly what we want to do.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 9 replies
  • 293 views
  • 0 likes
  • 3 in conversation