Help using Base SAS procedures

same trade times - large data sets

Accepted Solution Solved
Reply
Contributor
Posts: 56
Accepted Solution

same trade times - large data sets

The Datasets here are for example only, my real datasets are for up to 1.5 million observations.

I have 2 products 'A' and 'B' trading, trade_time and trade_price are recorded variables, so I have

DATA product_A;

   INPUT A_datetime datetime16. A_price $;

   DATALINES;

22JUN11:00:00:04    1.7

22JUN11:00:00:08 1.9

22JUN11:00:00:10 2.3

22JUN11:00:05:12 2.9

22JUN11:00:05:36 3.5

22JUN11:00:32:45 3.2

22JUN11:00:39:01 3.1

22JUN11:00:39:03 2.7

22JUN11:01:10:12 2.6

22JUN11:07:22:10 2.1

26JUN11:12:42:17 3.4

26JUN11:12:42:18 3.6

26JUN11:12:42:21 3.9

;

RUN;

DATA product_B;

   INPUT B_datetime datetime16. B_price $;

   DATALINES;

22JUN11:00:00:09    0.02

22JUN11:00:05:34    0.07

22JUN11:00:39:05    0.04

26JUN11:12:42:20    0.09

;

RUN;

Each time 'B' trades, i want to know what 'A' traded at, but only if 'A' traded within the last 5 seconds looking backward.

(i.e if B trades and there are no records in the last 5 seconds for A, then discard, even if 'A' trades one second afterwards, - looking backwards in time only)

So I want to create a new dataset called 'same_trade_times'. which would have the following data,...

DATA same_trade_times;

   INPUT B_datetime datetime16.

         B_price $

         A_datetime $

         A_price $

         ;

   DATALINES;

22JUN11:00:00:09   0.02   22JUN11:00:00:08    1.9

22JUN11:00:39:05   0.04   22JUN11:00:39:03    2.7

26JUN11:00:42:20   0.09   26JUN11:12:42:18    3.6

;

RUN;

any ideas on this one?


Accepted Solutions
Solution
‎03-26-2012 03:18 PM
Super User
Super User
Posts: 7,083

Re: same trade times - large data sets

Use MERGE and RETAIN.

data want ;

  merge A(in=ina rename=(a_datetime=datetime))

        B(in=inb rename=(b_datetime=datetime))

  ;

  by datetime;

  retain a_datetime b_datetime prev_a;

  if ina then a_datetime=datetime;

  if inb then b_datetime=datetime;

  if ina then prev_a = a;

  if inb and  0 <= (b_datetime - a_datetime) <= 5 then output;

run;

View solution in original post


All Replies
Solution
‎03-26-2012 03:18 PM
Super User
Super User
Posts: 7,083

Re: same trade times - large data sets

Use MERGE and RETAIN.

data want ;

  merge A(in=ina rename=(a_datetime=datetime))

        B(in=inb rename=(b_datetime=datetime))

  ;

  by datetime;

  retain a_datetime b_datetime prev_a;

  if ina then a_datetime=datetime;

  if inb then b_datetime=datetime;

  if ina then prev_a = a;

  if inb and  0 <= (b_datetime - a_datetime) <= 5 then output;

run;

Contributor
Posts: 56

Re: same trade times - large data sets

is that an 'in' or an 'if' at the start of the third line up from the bottom?

Super User
Super User
Posts: 7,083

Re: same trade times - large data sets

IF of course.  I have updated the previous post to fix the typo.

Super User
Posts: 5,518

Re: same trade times - large data sets

Tom,

I'm not sure you can rally MERGE these two.  The DATETIME values may not form an exact match.

I don't have the time to do this, but here is an approach (bare bones outline).  It seems a little strange at first to create 5-second intervals based on the Product_A data set, but logically it seems sound.  Of course, when it comes to setting up a format with 1M+ levels on a PC, there are no guarantees as to how much time it will take.

Use Product_A data to create a CNTLIN data set for PROC FORMAT.  The ranges would be from lag(a_datetime) to the MIN of either a_datetime or lag(a_datetime)+5.  The CNTLIN data set would need to use an extra variable to indicate whether the endpoint of the range was either included (if using lag(a_datetime)+5 as the endpoint) or excluded (if using a_datetime as the endpoint).  The format would use concatenated a_datetime + A_Price as the LABEL.  Attention would be needed to properly create the very final range, and possibly to set up the Other= definition when there is no match.

Applying the format would be easy:

data same_time_trades;

set product_b;

A_match = put(b_datetime, fmtname.);

Then split A_match into the datetime and the price.

Super User
Super User
Posts: 7,083

Re: same trade times - large data sets

Posted in reply to Astounding

There is no need to match the datetime values.  The RETAIN statements take care of that.  The same logic should work using SET with a BY statment to interleave the records.

Super User
Posts: 5,518

same trade times - large data sets

OK, got it.  And I like it!

You've probably posted this already, but I think you meant to code prev_a=a_price instead of prev_a=a.

Super Contributor
Posts: 1,636

Re: same trade times - large data sets

Posted in reply to Astounding

I ran Tom's Code with the sample data OP provided.

DATA A;

   INPUT A_datetime datetime16. A_price $;

   DATALINES;

22JUN11:00:00:04 1.7

22JUN11:00:00:08 1.9

22JUN11:00:00:10 2.3

22JUN11:00:05:12 2.9

22JUN11:00:05:36 3.5

22JUN11:00:32:45 3.2

22JUN11:00:39:01 3.1

22JUN11:00:39:03 2.7

22JUN11:01:10:12 2.6

22JUN11:07:22:10 2.1

26JUN11:12:42:17 3.4

26JUN11:12:42:18 3.6

26JUN11:12:42:21 3.9

;

RUN;

DATA B;

   INPUT B_datetime datetime16. B_price $;

   DATALINES;

22JUN11:00:00:09    0.02

22JUN11:00:05:34    0.07

22JUN11:00:39:05    0.04

26JUN11:12:42:20    0.09

;

run;

data want ;

  merge A(in=ina rename=(a_datetime=datetime))

        B(in=inb rename=(b_datetime=datetime))

  ;

  by datetime;

  retain a_datetime b_datetime prev_a;

  if ina then a_datetime=datetime;

  if inb then b_datetime=datetime;

  if ina then prev_a = a_price;

  if inb and  0 <= (b_datetime - a_datetime) <= 5 then output;

  format a_datetime b_datetime datetime datetime16.;

run;

proc print ;run;

                                          a                b

                                          _                _

                  d                       d                d

                  a A  B                  a                a

                  t _  _                  t                t  p

                  e p  p                  e                e  r

                  t r  r                  t                t  e

O                i i  i                  i                i  v

b                m c  c                  m                m  _

s                e e  e                  e                e  a

1 22JUN11:00:00:09   0.02 22JUN11:00:00:08 22JUN11:00:00:09 1.9

2 22JUN11:00:39:05   0.04 22JUN11:00:39:03 22JUN11:00:39:05 2.7

3 26JUN11:12:42:20   0.09 26JUN11:12:42:18 26JUN11:12:42:20 3.6

Contributor
Posts: 56

same trade times - large data sets

seems to work on the small example data set, although variable 'a_price' seems to not show up in the 'want' table?

Contributor
Posts: 56

same trade times - large data sets

Just ran it on my large data sets

A - 1.5 million obs

B = 21k obs

data set 'want' gets done in 0.42 seconds, WOW, amazing!

and for some reason 'a_pricea now gets shown fine in the output of 'want' for these large data sets, strange?

Super User
Super User
Posts: 7,083

same trade times - large data sets

You are getting more values for A_PRICE because of more exact matches in DATETIME values.

Without an exact match the price for A that corresponds to the A_DATETIME would be in the variable PREV_A.

You could deal with that with more rename commands or make the logic more complex to handle getting the value retained back into the A_PRICE variable before the OUTPUT statement.

Note that if you wanted to match one B record to multiple A records then this method will not work.  It will only output the A price that is closest in time (but still earlier).

Respected Advisor
Posts: 3,156

same trade times - large data sets

Hi skipper,

The following code will be slower than Tom's, however, it should be way faster than my other two trials. It has a little bit more than 1 pass of your huge dataset, and it will keep your a_price as you wish. Please give it try and let me how it goes:

data have (index=(a_datetime));

format a_datetime datetime16.;

  set product_a product_b(rename=b_datetime=a_datetime);

  by a_datetime;

  lga_time=lag(a_datetime);

  lga_price=lag(a_price);

  run;

  data want (drop=lgSmiley Happy;

  format b_datetime datetime16.;

  set product_b (rename=b_datetime=a_datetime);

    set have key=a_datetime;

    if _iorc_=%sysrc(_sok) then do;

       if not missing(a_price) then do;

          b_datetime=a_datetime;

          output;

        end;

        else if missing(a_price) then do;

           b_datetime=a_datetime;

           a_datetime= lga_time;

           a_price=lga_price;

           if 0<=b_datetime-a_datetime<=5 then output;

        end;

     end;

run;

Contributor
Posts: 56

same trade times - large data sets

thanks I will try this and report back

Super User
Posts: 10,046

same trade times - large data sets

DOW can do it. But If you need fast it ,Hash Table will be favored.

But for your data, I am curious. You said you want datetime within five seconds .

There two obs within 5 seconds for  26JUN11:00:42:20   0.09

26JUN11:12:42:17 3.4

26JUN11:12:42:18 3.6


You need them all, or just only need the nearest obs ?

DATA product_A;
   INPUT A_datetime datetime16. A_price $;
   format a_datetime datetime.;
   DATALINES;
22JUN11:00:00:04    1.7
22JUN11:00:00:08 1.9
22JUN11:00:00:10 2.3
22JUN11:00:05:12 2.9
22JUN11:00:05:36 3.5
22JUN11:00:32:45 3.2
22JUN11:00:39:01 3.1
22JUN11:00:39:03 2.7
22JUN11:01:10:12 2.6
22JUN11:07:22:10 2.1
26JUN11:12:42:17 3.4
26JUN11:12:42:18 3.6
26JUN11:12:42:21 3.9
;
RUN;

DATA product_B;
   INPUT B_datetime datetime16. B_price $;
   format b_datetime datetime.;
   DATALINES;
22JUN11:00:00:09    0.02
22JUN11:00:05:34    0.07
22JUN11:00:39:05    0.04
26JUN11:12:42:20    0.09
;
RUN;

data want;
 set product_b;
 do until(a_datetime gt b_datetime);
  if b_datetime-a_datetime lt 5 and b_datetime-a_datetime ge 0 then output;
  set product_a;
 end;
run;


Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 461 views
  • 0 likes
  • 6 in conversation