I have 2 products 'A' and 'B' trading, trade_time and trade_price are recorded variables, so I have
DATA product_A;
INPUT A_datetime datetime16. A_price $;
DATALINES;
22JUN11:00:00:04 1.7
22JUN11:00:00:08 1.9
22JUN11:00:00:10 2.3
22JUN11:00:05:12 2.9
22JUN11:00:05:36 3.5
22JUN11:00:32:45 3.2
22JUN11:00:39:01 3.1
22JUN11:00:39:03 2.7
22JUN11:01:10:12 2.6
22JUN11:07:22:10 2.1
26JUN11:12:42:17 3.4
26JUN11:12:42:18 3.6
26JUN11:12:42:21 3.9
;
RUN;
DATA product_B;
INPUT B_datetime datetime16. B_price $;
DATALINES;
22JUN11:00:00:09 0.02
22JUN11:00:05:34 0.07
22JUN11:00:39:05 0.04
26JUN11:12:42:20 0.09
;
RUN;
Each time 'B' trades, i want to know what 'A' traded at, but only if 'A' traded within the last 5 seconds looking backward.
(i.e if B trades and there are no records in the last 5 seconds for A, then discard, even if 'A' trades one second afterwards, - looking backwards in time only)
So I want to create a new dataset called 'same_trade_times'. which would have the following data,...
DATA same_trade_times;
INPUT B_datetime datetime16.
B_price $
A_datetime $
A_price $
;
DATALINES;
22JUN11:00:00:09 0.02 22JUN11:00:00:08 1.9
22JUN11:00:39:05 0.04 22JUN11:00:39:03 2.7
26JUN11:00:42:20 0.09 26JUN11:12:42:18 3.6
;
RUN;
any ideas on this one?
Logically, SQL seems easier to code:
DATA product_A;
INPUT A_datetime datetime16. A_price $;
DATALINES;
22JUN11:00:00:04 1.7
22JUN11:00:00:08 1.9
22JUN11:00:00:10 2.3
22JUN11:00:05:12 2.9
22JUN11:00:05:36 3.5
22JUN11:00:32:45 3.2
22JUN11:00:39:01 3.1
22JUN11:00:39:03 2.7
22JUN11:01:10:12 2.6
22JUN11:07:22:10 2.1
26JUN11:12:42:17 3.4
26JUN11:12:42:18 3.6
26JUN11:12:42:21 3.9
;
RUN;
DATA product_B;
INPUT B_datetime datetime16. B_price $;
DATALINES;
22JUN11:00:00:09 0.02
22JUN11:00:05:34 0.07
22JUN11:00:39:05 0.04
26JUN11:12:42:20 0.09
;
RUN;
proc sql;
create table same_trade_times as
select b.b_datetime format=datetime16., b_price, a_datetime format=datetime16., a_price
from product_A a,product_b b, (select max(a_datetime) as adt, b_datetime from product_A ,product_b
where 0 <= b_datetime-a_datetime <= 5
group by b_datetime) c
where a.a_datetime=c.adt and b.b_datetime=c.b_datetime
;
quit;
proc print;run;
Regards,
Haikuo
Edit: it is not too bad if hashing.
data want (drop=_rc);
if _n_=1 then do;
set product_A (obs=1);
format a_datetime datetime16.;
dcl hash h(dataset: 'product_A', ordered:'d');
h.definekey('A_datetime');
h.definedata(all:'y');
h.definedone();
dcl hiter hi('h');
end;
set product_b;
format b_datetime datetime16.;
_rc=hi.first();
do while (_rc=0);
if 0<=b_datetime-a_datetime<=5 then do;
output;
return;
end;
_rc=hi.next();
end;
run;
Hai Kuo, thank you so much 🙂 you are a Legend of SAS !!!!
Thank you very much for this overflattering ranking. And Please don't get me killed. seriously, I would have ranked myself aournd 10Kth of SAS legend.
ha, :smileylaugh: not quite a legend, I forgot to say my datasets are huge, product_a has 1.5m obs and product_b has 8k obs, just ran the code, and its 3 minutes in and still going, i feel like my laptop is going to blow up! to much SAS to handle,..
still running! about 7 minutes now!
sorry hau kuo, its taking way to long, my SAS is still running about 10 minutes now, have to mark you as incorrect, my fault really as i did not say how large my data sets where.
but i thought sql was quicker than SAS ???
How can i BREAK out of this enless code run?
Cheers for the EDIT
im running that code now, still taking some time, still running, will report back!
Since your product_a is huge, I have updated a little to the hash code, to increase RAM allocation:
data want (drop=_rc);
if _n_=1 then do;
set product_A (obs=1);
format a_datetime datetime16.;
dcl hash h(dataset: 'product_A', ordered:'d', hashexp:20);
h.definekey('A_datetime');
h.definedata(all:'y');
h.definedone();
dcl hiter hi('h');
end;
set product_b;
format b_datetime datetime16.;
_rc=hi.first();
do while (_rc=0);
if 0<=b_datetime-a_datetime<=5 then do;
output;
return;
end;
_rc=hi.next();
end;
run;
You also should try Tom's code. Let us know your benchmark.
Good Luck,
Haikuo
cool, thanks alot, i will try, Im still trying toms code
ok, just run that above code, about 1 minute in now.
10 minutes runing and still going, will have to end this one
yep it just hangs, and hangs, been going for 10 minutes now have to shut it down!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.