BookmarkSubscribeRSS Feed
skipper
Calcite | Level 5

I have 2 products 'A' and 'B' trading, trade_time and trade_price are recorded variables, so I have

DATA product_A;

   INPUT A_datetime datetime16. A_price $;

   DATALINES;

22JUN11:00:00:04    1.7

22JUN11:00:00:08 1.9

22JUN11:00:00:10 2.3

22JUN11:00:05:12 2.9

22JUN11:00:05:36 3.5

22JUN11:00:32:45 3.2

22JUN11:00:39:01 3.1

22JUN11:00:39:03 2.7

22JUN11:01:10:12 2.6

22JUN11:07:22:10 2.1

26JUN11:12:42:17 3.4

26JUN11:12:42:18 3.6

26JUN11:12:42:21 3.9

;

RUN;

DATA product_B;

   INPUT B_datetime datetime16. B_price $;

   DATALINES;

22JUN11:00:00:09    0.02

22JUN11:00:05:34    0.07

22JUN11:00:39:05    0.04

26JUN11:12:42:20    0.09

;

RUN;

Each time 'B' trades, i want to know what 'A' traded at, but only if 'A' traded within the last 5 seconds looking backward.

(i.e if B trades and there are no records in the last 5 seconds for A, then discard, even if 'A' trades one second afterwards, - looking backwards in time only)

So I want to create a new dataset called 'same_trade_times'. which would have the following data,...

DATA same_trade_times;

   INPUT B_datetime datetime16.

         B_price $

         A_datetime $

         A_price $

         ;

   DATALINES;

22JUN11:00:00:09   0.02   22JUN11:00:00:08    1.9

22JUN11:00:39:05   0.04   22JUN11:00:39:03    2.7

26JUN11:00:42:20   0.09   26JUN11:12:42:18    3.6

;

RUN;

any ideas on this one?

13 REPLIES 13
Haikuo
Onyx | Level 15

Logically, SQL seems easier to code:

DATA product_A;

   INPUT A_datetime datetime16. A_price $;

   DATALINES;

22JUN11:00:00:04    1.7

22JUN11:00:00:08 1.9

22JUN11:00:00:10 2.3

22JUN11:00:05:12 2.9

22JUN11:00:05:36 3.5

22JUN11:00:32:45 3.2

22JUN11:00:39:01 3.1

22JUN11:00:39:03 2.7

22JUN11:01:10:12 2.6

22JUN11:07:22:10 2.1

26JUN11:12:42:17 3.4

26JUN11:12:42:18 3.6

26JUN11:12:42:21 3.9

;

RUN;

DATA product_B;

   INPUT B_datetime datetime16. B_price $;

   DATALINES;

22JUN11:00:00:09    0.02

22JUN11:00:05:34    0.07

22JUN11:00:39:05    0.04

26JUN11:12:42:20    0.09

;

RUN;

proc sql;

create table same_trade_times  as

select b.b_datetime format=datetime16., b_price, a_datetime format=datetime16., a_price

from product_A a,product_b b, (select max(a_datetime) as adt, b_datetime from product_A ,product_b

   where 0 <= b_datetime-a_datetime <= 5

   group by b_datetime) c

where a.a_datetime=c.adt and b.b_datetime=c.b_datetime

   ;

quit;

proc print;run;

Regards,

Haikuo

Edit: it is not too bad if hashing.

data want (drop=_rc);

  if _n_=1 then do;

     set product_A (obs=1);

format a_datetime datetime16.;

dcl hash h(dataset: 'product_A', ordered:'d');

h.definekey('A_datetime');

h.definedata(all:'y');

h.definedone();

dcl hiter hi('h');

   end;

set product_b;

format b_datetime datetime16.;

_rc=hi.first();

do while (_rc=0);

   if 0<=b_datetime-a_datetime<=5 then do;

      output;

return;

    end;

_rc=hi.next();

  end;

  run;

skipper
Calcite | Level 5

Hai Kuo, thank you so much 🙂 you are a Legend of SAS !!!! Smiley Wink

Haikuo
Onyx | Level 15

Thank you very much for this overflattering ranking. And Please don't get me killed. seriously, I would have ranked myself aournd 10Kth of SAS legend.  Smiley Wink

skipper
Calcite | Level 5

ha, :smileylaugh: not quite a legend, I forgot to say my datasets are huge, product_a has 1.5m obs and product_b has 8k obs, just ran the code, and its 3 minutes in and still going, i feel like my laptop is going to blow up! to much SAS to handle,..

skipper
Calcite | Level 5

still running! about 7 minutes now!

skipper
Calcite | Level 5

sorry hau kuo, its taking way to long, my SAS is still running about 10 minutes now, have to mark you as incorrect, my fault really as i did not say how large my data sets where.

but i thought sql was quicker than SAS ???

skipper
Calcite | Level 5

How can i BREAK out of this enless code run?

skipper
Calcite | Level 5

Cheers for the EDIT

im running that code now, still taking some time, still running, will report back!

Haikuo
Onyx | Level 15

Since your product_a is huge, I have updated a little to the hash code, to increase RAM allocation:

data want (drop=_rc);

  if _n_=1 then do;

     set product_A (obs=1);

format a_datetime datetime16.;

dcl hash h(dataset: 'product_A', ordered:'d', hashexp:20);

h.definekey('A_datetime');

h.definedata(all:'y');

h.definedone();

dcl hiter hi('h');

   end;

set product_b;

format b_datetime datetime16.;

_rc=hi.first();

do while (_rc=0);

   if 0<=b_datetime-a_datetime<=5 then do;

      output;

return;

    end;

_rc=hi.next();

  end;

  run;

You also should try Tom's code. Let us know your benchmark.

Good Luck,

Haikuo

skipper
Calcite | Level 5

cool, thanks alot, i will try, Im still trying toms code Smiley Happy

skipper
Calcite | Level 5

ok, just run that above code, about 1 minute in now.

skipper
Calcite | Level 5

10 minutes runing and still going, will have to end this one Smiley Sad

skipper
Calcite | Level 5

yep it just hangs, and hangs, been going for 10 minutes now have to shut it down!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1273 views
  • 0 likes
  • 2 in conversation