BookmarkSubscribeRSS Feed
machete
Calcite | Level 5

Dear all,

I have a financial time series dataset with a frequency of seconds.

To perform statistical analysis I need to search and locate for every record (row), the record which is 5minutes ahead of time.

Then i would like to retain from the record which is 5minutes ahead two observations, the columns bidprc and offerprc and add then next to the original record.

For example if we take from my attached data:

time           currency  bidprc      offerprc           dt                                     bidprc5min  offerpc5mins

08:07:09   dem/chf   0.86760  0.86770          09JAN97:08:07:09      to be located  to be located

I would like to locate above the record which is in at 08:12:09 or if no such record exists the one which is closest to that time.

Is it possible to perform the above search without creating two datasets and e.g using proc sql which is efficient and quick?

Any ideas on how to program the above are more than welcome

Thanks!!!

Neo

23 REPLIES 23
ballardw
Super User

Does the data ALWAYS have a record at exactly +5 minutes (300 seconds)?

If so something like this may work

proc sql;

     create table newdata as

     select a.*, b.bidprc as bidprc5min, b.offerprc as offerprc5min

     from have as a left join have as b

          on a.time = (b.time -300);

quit;

machete
Calcite | Level 5

Hi Ballardw,

I will check if the above works. However there isnt a record always at 300 seconds and I would like to make sure that there is always a record in the new tables, so it would be best if the code you proposed can locate the next closest record e.g in 301 seconds or more....

Is it possible to do this?

Thnx

Neo

machete
Calcite | Level 5

The code you provided generates additional records. The sample has 50000 but  the newdata has 68350...any ideas why is that?

also if i run the code on my whole dataset (>1 million records) i get many empty ones, where values have not been located. is it possible to change the code as below so that i can get results for all my records?

on a.time >= (b.time - 300)

thanks again!

Ksharp
Super User

if you want the next closed record ,why not fetch the next obs ? if your table has already been sorted by DT .

proc import datafile='c:\temp\fixing sample.xls' out=have dbms=xls replace;run;

data want;
merge have have(firstobs=2  keep=bidprc offerprc rename=(bidprc=bidprc5min offerprc=offerprc5min));
label  bidprc5min=' '  offerprc5min=' ';
run;

Xia Keshan

machete
Calcite | Level 5

Hi Xia,

the task is to find the record which is 5min ahead in time or if there is no such record then the one which is closest to 5mins....

The code that Ballardw provided was simply an inner join which works only if the data have ALWAYS a matchning record....which is NOT the case so I need some kind of an sql search algorithm...

Any ideas will help

Thnx

Neo

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Can you assign each row to a grouping variable, for instance actual time vs planned timepoint.

HAVE

ACTUAL_TIME     BASE_TIME          GROUP_ASSIGNED     GROUPN

12:34                     12:34                     BASELINE                       1

12:36                      12:34                    BASELINE+5                    2

12:37                      12:34                    BASELINE+5                    2

12:43                      12:34                    BASELINE+10                  3

12:47                      12:34                    BASELINE+15                  4

...

You can then group data into groups using the new variable irrespective of actual time, i.e;

select     FIRST.ACTUAL_TIME,

               NEXT.ACTUAL_TIME

from          HAVE FIRST

left join       HAVE NEXT

on          FIRST.GROUPN=NEXT.GROUPN-1

machete
Calcite | Level 5

Hi,

well I am nost sure how this will work as the orginal dataset will have 20million rows...re arranging the whole dataset into groups will be difficult...a search algorithm will work easier however I dont know how to program it....

Neo

gergely_batho
SAS Employee

Hi Neo, I realy suggest you to try the above mentioned approach. It runs very fast: 0.04 sec on the attached data. It needs only one pass through the data. 20mill rows will be not a problem.

With some more programming it can be extended to support different lookahead and lookback times. Or to support different currencies or more trading days.

machete
Calcite | Level 5

Hi Gergely,

by above you mean the one you proposed?

I will run it and let you know, however by looking at the code I am almost sure that some records will be empty/unmatched since the records are not at regular intervals, it means there might be no record exactly at 5min ahead but 5minutes and 5 seconds and the code will need to check is there is no record to pick the next closest,in this case at 5mins and 5 seconds....I will let you know as soon as I check yours

Neo

gergely_batho
SAS Employee

Yes Neo, I mean my post. It is not a simple merge or a simple interleaving. Look at it carefully. If there is no matching record exactly at 5 minutes, it will use the nearest before 5 minutes. Exactly as you described.

I suggest to look at it for everyone! The beauty of this technique is that it is the most natural approach. You would do the same using pen and paper.

The same approach can be used to handle a lot of problems where you have time stamped data (transactions, dimension tables, SCD2, etc.). Calculating moving averages, moving correlations, "merging" dimension tables and/or fact tables, look-back and look-ahead and more.

It works for regularly and irregularly timestamped data too. It works it there are more transactions on the same timestamp - one of them is chosen (the last one physically).

Is is extremely fast (basically one pass through the data - what could be faster?). There is only one variant of it, which might be slightly faster, but it is a bit more complicated code.

jakarman
Barite | Level 11

Can you change your logic?

It is a bit of strange thinking for future events that has happened already. It could be more easier thinking on calculations that have been happened in the paste.

That is more like stream processing. In that case the whole issue could be easier technical to solve (lag/retain) in am more generic way.  
Validations on predictions (modeling) is a possible reason for your question. 

---->-- ja karman --<-----
machete
Calcite | Level 5

i am doing both looking in the past but also tryng to 'validate' trading strategies...this is why I want to 'fetch' the records which are ahead (leading) in time fox x minutes

gergely_batho
SAS Employee

A trick used here: Merging two datasets with different but similar columns helps here.

Dataset should be sorted by time.


data have_minus5 / view= have_minus5;

               set have;

               time=time-'00:05:00't;/*or use intnx()*/

               rename bidprc=bidprc5min offerprc=offerpc5mins;

run;

data want;

  set have_minus5(in=inH5 keep=time) have(in=inH keep=time);

  by time;

  if inH then do;

            set have;

            output;

  end;

  if inH5 then do;

       set have_minus5;

  end;

run;

Sorry, this is not tested. Tested. And yes, this is like stream processing.  ~ one pass through the data.

Be careful when copying apostrophes from this page, they are converted to something else.

I have changed this set statement in the code: set have(in=inH keep=time) have_minus5(in=inH5 keep=time);

It matters which dataset comes first. By changing dataset order in this set statement, you have a very fine control, over handling equal timestamps.

Message was edited by: Gergely Bathó

Ksharp
Super User

Here is . But it cost me two minutes to run it. Let me know if it could worked.

proc import datafile='c:\temp\fixing sample.xls' out=have dbms=xls replace;run;

proc sql;
create table want as
 select a.*,(select bidprc from have where time ge a.time+300 having time eq min(time) ) as bidprc5min
   , (select offerprc from have where time ge a.time+300 having time eq min(time) ) as offerprc5min
  from have as a ;
quit;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 1774 views
  • 3 likes
  • 7 in conversation