BookmarkSubscribeRSS Feed
machete
Calcite | Level 5

Hi Xia,

i tried your code but it does not work, any ideas why?? by reading it looks ok but it does not execute:

15   proc sql;

16   create table fixing.fixing_eurchfsample1want as

17   select a.*,(select bidprc from fixing.fixing_eurchfsample1 where time ge a.time+300 having

17 !  time eq min(time)) as bidprc5min,

18   (select offerprc from fixing.fixing_eurchfsample1 where time ge a.time+300 having time eq

18 ! min(time))  as offerprc5min

19   from fixing.fixing_eurchfsample1 as a ;

NOTE: The query requires remerging summary statistics back with the original data.

ERROR: Subquery evaluated to more than one row.

NOTE: Correlation values are: time=67400 .

20   quit;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.45 seconds

      cpu time            0.43 seconds

Ksharp
Super User

That is because there are duplicated rows for the same date and same time . E.X

01jan2010 00:08:21  21

01jan2010 00:08:21  54

proc import datafile='c:\temp\fixingsample.xls' out=temp dbms=xls replace;run;

proc sort data=temp out=have nodupkey;by date time;run;

proc sql;

create table want as

select a.*,(select bidprc from have where time ge a.time+300 having time eq min(time) ) as bidprc5min

   , (select offerprc from have where time ge a.time+300 having time eq min(time) ) as offerprc5min

  from have as a ;

quit;

Xia Keshan

Ksharp
Super User

Here is a fast code from PG's good idea .

proc import datafile='c:\temp\fixingsample.xls' out=temp dbms=xls replace;run;
proc sort data=temp out=have ;by date time;run; 
data want(drop= _time);
 set have;
 do i=_n_+1 to nobs;
  set have(keep=time bidprc offerprc rename=( time=_time bidprc=bidprc5min   offerprc=offerpc5mins ) ) nobs=nobs point=i;
  if time+300 le _time then leave;
  call missing(bidprc5min,offerpc5mins);
 end; 
 label bidprc5min= 'bidprc5min'  offerpc5mins= 'offerpc5mins' ;
run;

Xia Keshan

jakarman
Barite | Level 11

If you can validate your strategies when processing those future records as the current one you do not need any reordering, it will be kept as stream-processing.

Just think in an other way to solve time questions. Suppose you have 100 strategies calculated. Evaluation can be done later (log/retain) not necessary at you current record.

---->-- ja karman --<-----
PGStats
Opal | Level 21

Finding the nearest match is always a bit tricky. I set up a time window to find the nearest match between t+120 and t+480 seconds.

The following works OK but is a bit slow. It might be a bit faster if dt was indexed :

libname xl Excel "&sasforum.\datasets\fixing sample.xls" access=readonly;

proc sql;

create table s as

select input(dt, datetime20.) as dt format=datetime21.,

    bidprc, offerprc from xl.'Sheet1$'n;

quit;

libname xl clear;

proc sql;

create table want as

select

    a.bidprc, a.offerprc, a.dt,

    b.bidprc as bidprc5,

    b.offerprc as offerprc5,

    b.dt as dt5,

    abs(intck('SECOND', a.dt, b.dt) - 300) as toff

from

    s as a inner join

    s as b

        on b.dt between a.dt+120 and a.dt+480

group by a.dt

having calculated toff = min(calculated toff);

quit;

PG

PG
jakarman
Barite | Level 11

The question as asked is not one for the SQL an not for analytics approaches.

Checking the sample dataset the timing with the events is very random at several second to minutes and longer.

As human you would put your finger on the line searching for the record you need an put new values in.
This a "balance line merge" algorithm known from 3 GL programming languages.

The dedicated problem is:

- merging the dataset to is self 

- assuming the data is sorted (dtcur)

The find condition is not very sophisticated the difference is 5 minutes except for the last records.

Once working this one could be improved.

Using the point option with the set statement moving using directs access in the dataset is possible.

Executing will be fast as long the cache on IO can solve all request (in memory).

Other techniques as sasfile or the hashobject are options with other limitations.

/* create testdata ,  this is from excel using European conventions */

filename fixinp "/folders/myshortcuts/ikke" ;

data fixings ;
infile fixinp(fixingsample.csv) dlm=";" dsd firstobs=2 truncover ;
input date : mmddyy. time : time. currency : $8. bidprc : numx. offerprc : numx.  dtcur : datetime. ;
drop date time ;
eventid = _N_ ;   
format dtcur datetime. ;
run;

/* the set combine/merge using point wiht searching */ 

data fixings300;
  set fixings nobs=fn_nobs ;
  Retain eventlk 1  ; check=1 ;
  do while (eventlk < fn_nobs & check ) ; 
   set fixings (rename=(bidprc=bidprc5 offerprc=offerprc5m eventid=event5m dtcur=dtcur5m) ) point=eventlk  end=end5m ; 
   if ( dtcur+300 > dtcur5m ) then eventlk=eventlk+1 ; else check=0;
  end;
run; 

---->-- ja karman --<-----
PGStats
Opal | Level 21

Much faster solution along the same lines as Jaap. The time window from 0 to 10 minutes is searched for the best match. If no observation occurs in that window, there is no match (missing values are generated)

libname xl Excel "&sasforum.\datasets\fixing sample.xls" access=readonly;

proc sql;

create table have as

select input(dt, datetime20.) as dt format=datetime21.,

    bidprc, offerprc from xl.'Sheet1$'n;

quit;

libname xl clear;

/* Number of seconds in the future to seek */

%let matchGoal=300;

/* Maximum number of seconds away from the goal to tolerate*/

%let matchLimit=300;


data want;

set have nobs=nobs;

match = &matchLimit;

call missing(p5, dt5, bidprc5, offerprc5);

do p = _n_+1 to nobs by 1 until (m > match);

    set have(keep=dt rename=dt=_dt) point=p;

    m = abs(intck('SECOND', dt, _dt) - &matchGoal);

    if m < match then do;

        match = m;

        p5 = p;

        end;

    end;

if not missing(p5) then do;

    set have(keep=dt bidprc offerprc

        rename=(dt=dt5 bidprc=bidprc5 offerprc=offerprc5))

        point=p5;

    end;

keep dt bidprc offerprc dt5 bidprc5 offerprc5;

run;

PG

PG
jakarman
Barite | Level 11

To be honest: Gergely did the first reply having this technical approach of moving around in the records.

SAS(R) 9.4 Language Reference: Concepts, Second Edition (see sequential / direct access). The simple incrementing moving forward is not easily seen as it done by coding that set statement.  The using of point= makes that more explicit. It is the same kind as the dow loop.

The simple incrementing works as you do not need every time the search form the current record. Those records are also incrementing in time.  Being able to move around (up and down) will let you develop very sophisticated options for finding references.

I am wondering how fast this will be on really big data.

---->-- ja karman --<-----
Ksharp
Super User

It is really fast I believed . That is the big advantage of SAS .

If you could test it how fast it would be .

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 1859 views
  • 3 likes
  • 7 in conversation