Hi Xia,
i tried your code but it does not work, any ideas why?? by reading it looks ok but it does not execute:
15 proc sql;
16 create table fixing.fixing_eurchfsample1want as
17 select a.*,(select bidprc from fixing.fixing_eurchfsample1 where time ge a.time+300 having
17 ! time eq min(time)) as bidprc5min,
18 (select offerprc from fixing.fixing_eurchfsample1 where time ge a.time+300 having time eq
18 ! min(time)) as offerprc5min
19 from fixing.fixing_eurchfsample1 as a ;
NOTE: The query requires remerging summary statistics back with the original data.
ERROR: Subquery evaluated to more than one row.
NOTE: Correlation values are: time=67400 .
20 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.45 seconds
cpu time 0.43 seconds
That is because there are duplicated rows for the same date and same time . E.X
01jan2010 00:08:21 21
01jan2010 00:08:21 54
proc import datafile='c:\temp\fixingsample.xls' out=temp dbms=xls replace;run;
proc sort data=temp out=have nodupkey;by date time;run;
proc sql;
create table want as
select a.*,(select bidprc from have where time ge a.time+300 having time eq min(time) ) as bidprc5min
, (select offerprc from have where time ge a.time+300 having time eq min(time) ) as offerprc5min
from have as a ;
quit;
Xia Keshan
Here is a fast code from PG's good idea .
proc import datafile='c:\temp\fixingsample.xls' out=temp dbms=xls replace;run; proc sort data=temp out=have ;by date time;run; data want(drop= _time); set have; do i=_n_+1 to nobs; set have(keep=time bidprc offerprc rename=( time=_time bidprc=bidprc5min offerprc=offerpc5mins ) ) nobs=nobs point=i; if time+300 le _time then leave; call missing(bidprc5min,offerpc5mins); end; label bidprc5min= 'bidprc5min' offerpc5mins= 'offerpc5mins' ; run;
Xia Keshan
If you can validate your strategies when processing those future records as the current one you do not need any reordering, it will be kept as stream-processing.
Just think in an other way to solve time questions. Suppose you have 100 strategies calculated. Evaluation can be done later (log/retain) not necessary at you current record.
Finding the nearest match is always a bit tricky. I set up a time window to find the nearest match between t+120 and t+480 seconds.
The following works OK but is a bit slow. It might be a bit faster if dt was indexed :
libname xl Excel "&sasforum.\datasets\fixing sample.xls" access=readonly;
proc sql;
create table s as
select input(dt, datetime20.) as dt format=datetime21.,
bidprc, offerprc from xl.'Sheet1$'n;
quit;
libname xl clear;
proc sql;
create table want as
select
a.bidprc, a.offerprc, a.dt,
b.bidprc as bidprc5,
b.offerprc as offerprc5,
b.dt as dt5,
abs(intck('SECOND', a.dt, b.dt) - 300) as toff
from
s as a inner join
s as b
on b.dt between a.dt+120 and a.dt+480
group by a.dt
having calculated toff = min(calculated toff);
quit;
PG
The question as asked is not one for the SQL an not for analytics approaches.
Checking the sample dataset the timing with the events is very random at several second to minutes and longer.
As human you would put your finger on the line searching for the record you need an put new values in.
This a "balance line merge" algorithm known from 3 GL programming languages.
The dedicated problem is:
- merging the dataset to is self
- assuming the data is sorted (dtcur)
The find condition is not very sophisticated the difference is 5 minutes except for the last records.
Once working this one could be improved.
Using the point option with the set statement moving using directs access in the dataset is possible.
Executing will be fast as long the cache on IO can solve all request (in memory).
Other techniques as sasfile or the hashobject are options with other limitations.
/* create testdata , this is from excel using European conventions */
filename fixinp "/folders/myshortcuts/ikke" ;
data fixings ;
infile fixinp(fixingsample.csv) dlm=";" dsd firstobs=2 truncover ;
input date : mmddyy. time : time. currency : $8. bidprc : numx. offerprc : numx. dtcur : datetime. ;
drop date time ;
eventid = _N_ ;
format dtcur datetime. ;
run;
/* the set combine/merge using point wiht searching */
data fixings300;
set fixings nobs=fn_nobs ;
Retain eventlk 1 ; check=1 ;
do while (eventlk < fn_nobs & check ) ;
set fixings (rename=(bidprc=bidprc5 offerprc=offerprc5m eventid=event5m dtcur=dtcur5m) ) point=eventlk end=end5m ;
if ( dtcur+300 > dtcur5m ) then eventlk=eventlk+1 ; else check=0;
end;
run;
Much faster solution along the same lines as Jaap. The time window from 0 to 10 minutes is searched for the best match. If no observation occurs in that window, there is no match (missing values are generated)
libname xl Excel "&sasforum.\datasets\fixing sample.xls" access=readonly;
proc sql;
create table have as
select input(dt, datetime20.) as dt format=datetime21.,
bidprc, offerprc from xl.'Sheet1$'n;
quit;
libname xl clear;
/* Number of seconds in the future to seek */
%let matchGoal=300;
/* Maximum number of seconds away from the goal to tolerate*/
%let matchLimit=300;
data want;
set have nobs=nobs;
match = &matchLimit;
call missing(p5, dt5, bidprc5, offerprc5);
do p = _n_+1 to nobs by 1 until (m > match);
set have(keep=dt rename=dt=_dt) point=p;
m = abs(intck('SECOND', dt, _dt) - &matchGoal);
if m < match then do;
match = m;
p5 = p;
end;
end;
if not missing(p5) then do;
set have(keep=dt bidprc offerprc
rename=(dt=dt5 bidprc=bidprc5 offerprc=offerprc5))
point=p5;
end;
keep dt bidprc offerprc dt5 bidprc5 offerprc5;
run;
PG
To be honest: Gergely did the first reply having this technical approach of moving around in the records.
SAS(R) 9.4 Language Reference: Concepts, Second Edition (see sequential / direct access). The simple incrementing moving forward is not easily seen as it done by coding that set statement. The using of point= makes that more explicit. It is the same kind as the dow loop.
The simple incrementing works as you do not need every time the search form the current record. Those records are also incrementing in time. Being able to move around (up and down) will let you develop very sophisticated options for finding references.
I am wondering how fast this will be on really big data.
It is really fast I believed . That is the big advantage of SAS .
If you could test it how fast it would be .
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.