I have 2 products 'A' and 'B' trading, trade_time and trade_price are recorded variables, so I have
DATA product_A;
INPUT A_datetime datetime16. A_price $;
DATALINES;
22JUN11:00:00:04 1.7
22JUN11:00:00:08 1.9
22JUN11:00:00:10 2.3
22JUN11:00:05:12 2.9
22JUN11:00:05:36 3.5
22JUN11:00:32:45 3.2
22JUN11:00:39:01 3.1
22JUN11:00:39:03 2.7
22JUN11:01:10:12 2.6
22JUN11:07:22:10 2.1
26JUN11:12:42:17 3.4
26JUN11:12:42:18 3.6
26JUN11:12:42:21 3.9
;
RUN;
DATA product_B;
INPUT B_datetime datetime16. B_price $;
DATALINES;
22JUN11:00:00:09 0.02
22JUN11:00:05:34 0.07
22JUN11:00:39:05 0.04
26JUN11:12:42:20 0.09
;
RUN;
Each time 'B' trades, i want to know what 'A' traded at, but only if 'A' traded within the last 5 seconds looking backward.
(i.e if B trades and there are no records in the last 5 seconds for A, then discard, even if 'A' trades one second afterwards, - looking backwards in time only)
So I want to create a new dataset called 'same_trade_times'. which would have the following data,...
DATA same_trade_times;
INPUT B_datetime datetime16.
B_price $
A_datetime $
A_price $
;
DATALINES;
22JUN11:00:00:09 0.02 22JUN11:00:00:08 1.9
22JUN11:00:39:05 0.04 22JUN11:00:39:03 2.7
26JUN11:00:42:20 0.09 26JUN11:12:42:18 3.6
;
RUN;
any ideas on this one?
Logically, SQL seems easier to code:
DATA product_A;
INPUT A_datetime datetime16. A_price $;
DATALINES;
22JUN11:00:00:04 1.7
22JUN11:00:00:08 1.9
22JUN11:00:00:10 2.3
22JUN11:00:05:12 2.9
22JUN11:00:05:36 3.5
22JUN11:00:32:45 3.2
22JUN11:00:39:01 3.1
22JUN11:00:39:03 2.7
22JUN11:01:10:12 2.6
22JUN11:07:22:10 2.1
26JUN11:12:42:17 3.4
26JUN11:12:42:18 3.6
26JUN11:12:42:21 3.9
;
RUN;
DATA product_B;
INPUT B_datetime datetime16. B_price $;
DATALINES;
22JUN11:00:00:09 0.02
22JUN11:00:05:34 0.07
22JUN11:00:39:05 0.04
26JUN11:12:42:20 0.09
;
RUN;
proc sql;
create table same_trade_times as
select b.b_datetime format=datetime16., b_price, a_datetime format=datetime16., a_price
from product_A a,product_b b, (select max(a_datetime) as adt, b_datetime from product_A ,product_b
where 0 <= b_datetime-a_datetime <= 5
group by b_datetime) c
where a.a_datetime=c.adt and b.b_datetime=c.b_datetime
;
quit;
proc print;run;
Regards,
Haikuo
Edit: it is not too bad if hashing.
data want (drop=_rc);
if _n_=1 then do;
set product_A (obs=1);
format a_datetime datetime16.;
dcl hash h(dataset: 'product_A', ordered:'d');
h.definekey('A_datetime');
h.definedata(all:'y');
h.definedone();
dcl hiter hi('h');
end;
set product_b;
format b_datetime datetime16.;
_rc=hi.first();
do while (_rc=0);
if 0<=b_datetime-a_datetime<=5 then do;
output;
return;
end;
_rc=hi.next();
end;
run;
Hai Kuo, thank you so much 🙂 you are a Legend of SAS !!!! ![]()
Thank you very much for this overflattering ranking. And Please don't get me killed. seriously, I would have ranked myself aournd 10Kth of SAS legend. ![]()
ha, :smileylaugh: not quite a legend, I forgot to say my datasets are huge, product_a has 1.5m obs and product_b has 8k obs, just ran the code, and its 3 minutes in and still going, i feel like my laptop is going to blow up! to much SAS to handle,..
still running! about 7 minutes now!
sorry hau kuo, its taking way to long, my SAS is still running about 10 minutes now, have to mark you as incorrect, my fault really as i did not say how large my data sets where.
but i thought sql was quicker than SAS ???
How can i BREAK out of this enless code run?
Cheers for the EDIT
im running that code now, still taking some time, still running, will report back!
Since your product_a is huge, I have updated a little to the hash code, to increase RAM allocation:
data want (drop=_rc);
if _n_=1 then do;
set product_A (obs=1);
format a_datetime datetime16.;
dcl hash h(dataset: 'product_A', ordered:'d', hashexp:20);
h.definekey('A_datetime');
h.definedata(all:'y');
h.definedone();
dcl hiter hi('h');
end;
set product_b;
format b_datetime datetime16.;
_rc=hi.first();
do while (_rc=0);
if 0<=b_datetime-a_datetime<=5 then do;
output;
return;
end;
_rc=hi.next();
end;
run;
You also should try Tom's code. Let us know your benchmark.
Good Luck,
Haikuo
cool, thanks alot, i will try, Im still trying toms code ![]()
ok, just run that above code, about 1 minute in now.
10 minutes runing and still going, will have to end this one ![]()
yep it just hangs, and hangs, been going for 10 minutes now have to shut it down!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.