How to locate a record in a time series that is 'x' minutes ahead in t... - Page 2

machete · Posted 08-01-2014 12:57 PM

Hi Xia,

i tried your code but it does not work, any ideas why?? by reading it looks ok but it does not execute:

15 proc sql;

16 create table fixing.fixing_eurchfsample1want as

17 select a.*,(select bidprc from fixing.fixing_eurchfsample1 where time ge a.time+300 having

17 ! time eq min(time)) as bidprc5min,

18 (select offerprc from fixing.fixing_eurchfsample1 where time ge a.time+300 having time eq

18 ! min(time)) as offerprc5min

19 from fixing.fixing_eurchfsample1 as a ;

NOTE: The query requires remerging summary statistics back with the original data.

ERROR: Subquery evaluated to more than one row.

NOTE: Correlation values are: time=67400 .

20 quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

real time 0.45 seconds

cpu time 0.43 seconds

Ksharp · Posted 08-02-2014 02:38 AM

That is because there are duplicated rows for the same date and same time . E.X

01jan2010 00:08:21 21

01jan2010 00:08:21 54

proc import datafile='c:\temp\fixingsample.xls' out=temp dbms=xls replace;run;

proc sort data=temp out=have nodupkey;by date time;run;

proc sql;

create table want as

select a.*,(select bidprc from have where time ge a.time+300 having time eq min(time) ) as bidprc5min

, (select offerprc from have where time ge a.time+300 having time eq min(time) ) as offerprc5min

from have as a ;

quit;

Xia Keshan

Ksharp · Posted 08-03-2014 01:11 AM

Here is a fast code from PG's good idea .

proc import datafile='c:\temp\fixingsample.xls' out=temp dbms=xls replace;run;
proc sort data=temp out=have ;by date time;run; 
data want(drop= _time);
 set have;
 do i=_n_+1 to nobs;
  set have(keep=time bidprc offerprc rename=( time=_time bidprc=bidprc5min   offerprc=offerpc5mins ) ) nobs=nobs point=i;
  if time+300 le _time then leave;
  call missing(bidprc5min,offerpc5mins);
 end; 
 label bidprc5min= 'bidprc5min'  offerpc5mins= 'offerpc5mins' ;
run;

Xia Keshan

jakarman · Posted 08-01-2014 04:04 AM

If you can validate your strategies when processing those future records as the current one you do not need any reordering, it will be kept as stream-processing.

Just think in an other way to solve time questions. Suppose you have 100 strategies calculated. Evaluation can be done later (log/retain) not necessary at you current record.

---->-- ja karman --<-----

PGStats · Posted 08-01-2014 02:47 PM

Finding the nearest match is always a bit tricky. I set up a time window to find the nearest match between t+120 and t+480 seconds.

The following works OK but is a bit slow. It might be a bit faster if dt was indexed :

libname xl Excel "&sasforum.\datasets\fixing sample.xls" access=readonly;

proc sql;

create table s as

select input(dt, datetime20.) as dt format=datetime21.,

bidprc, offerprc from xl.'Sheet1$'n;

quit;

libname xl clear;

proc sql;

create table want as

select

a.bidprc, a.offerprc, a.dt,

b.bidprc as bidprc5,

b.offerprc as offerprc5,

b.dt as dt5,

abs(intck('SECOND', a.dt, b.dt) - 300) as toff

from

s as a inner join

s as b

on b.dt between a.dt+120 and a.dt+480

group by a.dt

having calculated toff = min(calculated toff);

quit;

PG

jakarman · Posted 08-01-2014 06:17 PM

The question as asked is not one for the SQL an not for analytics approaches.

Checking the sample dataset the timing with the events is very random at several second to minutes and longer.

As human you would put your finger on the line searching for the record you need an put new values in.
This a "balance line merge" algorithm known from 3 GL programming languages.

The dedicated problem is:

- merging the dataset to is self

- assuming the data is sorted (dtcur)

The find condition is not very sophisticated the difference is 5 minutes except for the last records.

Once working this one could be improved.

Using the point option with the set statement moving using directs access in the dataset is possible.

Executing will be fast as long the cache on IO can solve all request (in memory).

Other techniques as sasfile or the hashobject are options with other limitations.

/* create testdata , this is from excel using European conventions */

filename fixinp "/folders/myshortcuts/ikke" ;

data fixings ;
infile fixinp(fixingsample.csv) dlm=";" dsd firstobs=2 truncover ;
input date : mmddyy. time : time. currency : $8. bidprc : numx. offerprc : numx. dtcur : datetime. ;
drop date time ;
eventid = _N_ ;
format dtcur datetime. ;
run;

/* the set combine/merge using point wiht searching */

data fixings300;
set fixings nobs=fn_nobs ;
Retain eventlk 1 ; check=1 ;
do while (eventlk < fn_nobs & check ) ;
set fixings (rename=(bidprc=bidprc5 offerprc=offerprc5m eventid=event5m dtcur=dtcur5m) ) point=eventlk end=end5m ;
if ( dtcur+300 > dtcur5m ) then eventlk=eventlk+1 ; else check=0;
end;
run;

---->-- ja karman --<-----

PGStats · Posted 08-01-2014 11:14 PM

Much faster solution along the same lines as Jaap. The time window from 0 to 10 minutes is searched for the best match. If no observation occurs in that window, there is no match (missing values are generated)

libname xl Excel "&sasforum.\datasets\fixing sample.xls" access=readonly;

proc sql;

create table have as

select input(dt, datetime20.) as dt format=datetime21.,

bidprc, offerprc from xl.'Sheet1$'n;

quit;

libname xl clear;

/* Number of seconds in the future to seek */

%let matchGoal=300;

/* Maximum number of seconds away from the goal to tolerate*/

%let matchLimit=300;

data want;

set have nobs=nobs;

match = &matchLimit;

call missing(p5, dt5, bidprc5, offerprc5);

do p = _n_+1 to nobs by 1 until (m > match);

set have(keep=dt rename=dt=_dt) point=p;

m = abs(intck('SECOND', dt, _dt) - &matchGoal);

if m < match then do;

match = m;

p5 = p;

end;

if not missing(p5) then do;

set have(keep=dt bidprc offerprc

rename=(dt=dt5 bidprc=bidprc5 offerprc=offerprc5))

point=p5;

end;

keep dt bidprc offerprc dt5 bidprc5 offerprc5;

run;

PG

jakarman · Posted 08-03-2014 01:57 AM

To be honest: Gergely did the first reply having this technical approach of moving around in the records.

SAS(R) 9.4 Language Reference: Concepts, Second Edition (see sequential / direct access). The simple incrementing moving forward is not easily seen as it done by coding that set statement. The using of point= makes that more explicit. It is the same kind as the dow loop.

The simple incrementing works as you do not need every time the search form the current record. Those records are also incrementing in time. Being able to move around (up and down) will let you develop very sophisticated options for finding references.

I am wondering how fast this will be on really big data.

---->-- ja karman --<-----

Ksharp · Posted 08-03-2014 03:13 AM

It is really fast I believed . That is the big advantage of SAS .

If you could test it how fast it would be .

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

Re: How to locate a record in a time series that is 'x' minutes ahead in time?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away