BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
crikriek
Calcite | Level 5

Hello everyone,

I'm currently writing my thesis and I'm facing a serious issue. It would be awesome if you could help me with it.

I want to merge two database. The first one (let's call it TRADES) concerns the trades concerning different stocks, this one goes like that:

id_tradeDatetime
3201/02/2014 09:00
2401/02/2014 09:03
3563/02/2014 09:12
9084/02/2014 09:15
14525/02/2014 09:23
65436/02/2014 09:45

The second (Let's call it SPREAD) database is like that

DatetimeBidAsk
1/02/2014 09:001213
1/02/2014 09:021415
3/02/2014 09:101314
4/02/2014 09:1113,514
5/02/2014 09:2412,213
6/02/2014 09:3511,314

What I would like is to merge these database in order to obtain the last BID and ASK values just before the trade. This would give this

id_tradeDatetimeBidAsk
3201/02/2014 09:001213
2401/02/2014 09:031415
3563/02/2014 09:1213,514
9084/02/2014 09:1513,514
14525/02/2014 09:2313,514
65436/02/2014 09:4511,314

I've been thinking a lot about this merge but I don't see how to do it. Could you please help me with it ?

Thanks a lot

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

If I guess right, You are coming from Euro ? and If your requirement is not too high,SQL can achieve that.

data x;
input id_trade     Datetime & anydtdtm.;
format      Datetime Datetime.;
cards;
320      1/02/2014 09:00
240      1/02/2014 09:03
356      3/02/2014 09:12
908      4/02/2014 09:15
1452      5/02/2014 09:23
6543      6/02/2014 09:45
;
run;
data y;
input Datetime & anydtdtm.     (Bid     Ask     ) (: commax.);
format      Datetime Datetime. Bid commax12.2 ;
cards;
1/02/2014 09:00      12     13
1/02/2014 09:02      14     15
3/02/2014 09:10      13     14
4/02/2014 09:11      13,5     14
5/02/2014 09:24      12,2     13
6/02/2014 09:35      11,3     14
;
run;
proc sql;
create table want as 
 select x.*,bid,ask
  from x,y
   where  x.Datetime-y.Datetime ge 0
    group by id_trade
     having min(x.Datetime-y.Datetime)=x.Datetime-y.Datetime ;
quit;

Xia Keshan

View solution in original post

12 REPLIES 12
BillJones
Calcite | Level 5

crikriek,

You need a unique key on both data sets to perform the merge.  I don't know your data, but I imagine something like symbol id_trade should work.  If you can get those variables on both files, then you can use code like the following:

proc sort data=trades nodupkey; by symbol id_trade; run; *check log to ensure there are no dups;

proc sort data=spread nodupkey; by symbol id_trade; run; *check log to ensure there are no dups;

data trade_info;

merge trades (in=a) spread (in=b);

by symobl id_trade;

if a;

run;

Code above performs a left outer join.  IOW, it keeps all the records from data set a (trades) and attaches information from data set b if the keys match.  If you want an  inner join which only keeps those records that match then use "if a & b;" instead of "if a;"

-Bill

Ksharp
Super User

If I guess right, You are coming from Euro ? and If your requirement is not too high,SQL can achieve that.

data x;
input id_trade     Datetime & anydtdtm.;
format      Datetime Datetime.;
cards;
320      1/02/2014 09:00
240      1/02/2014 09:03
356      3/02/2014 09:12
908      4/02/2014 09:15
1452      5/02/2014 09:23
6543      6/02/2014 09:45
;
run;
data y;
input Datetime & anydtdtm.     (Bid     Ask     ) (: commax.);
format      Datetime Datetime. Bid commax12.2 ;
cards;
1/02/2014 09:00      12     13
1/02/2014 09:02      14     15
3/02/2014 09:10      13     14
4/02/2014 09:11      13,5     14
5/02/2014 09:24      12,2     13
6/02/2014 09:35      11,3     14
;
run;
proc sql;
create table want as 
 select x.*,bid,ask
  from x,y
   where  x.Datetime-y.Datetime ge 0
    group by id_trade
     having min(x.Datetime-y.Datetime)=x.Datetime-y.Datetime ;
quit;

Xia Keshan

crikriek
Calcite | Level 5

Thank you guys. I really appreciate your help.

I think the solution of Xia Keshan best suits my needs. I was actually thinking about smthg like that. (And yes I come from Euro Smiley Happy )

Thanks Smiley Happy

crikriek
Calcite | Level 5

Xia Keshan,

I'm having trouble with the implementation of the code. I'm very new in SAS programming and I surely do'nt get all the subtilities of the program. Here's what I typed. LSV in my library. Trades_students_2012 is the dataset with the trades information and Orders_students_2012 is the one with the orders information. Time_OB is the name of the datetime in the orders_students_2012 dataset.

proc sql;

create table want as

select lsv.trades_students_2012.*,bid, ask

  from lsv.trades_students_2012,lsv.orders_students_2012

   where  lsv.trades_students_2012.Datetime-lsv.orders_students_2012.Time_OB ge 0

    group by id_trades

     having min(lsv.trades_students_2012.Datetime-lsv.Orders_students_2012.Time_OB)=lsv.trades_students_2012.Datetime-lsv.orders_students_2012.Datetime ;

quit;

I also have another problem since the column Datetime in the Trades dataset has been created by myself. I had the date and I had the time so I did this

proc sql;

   alter table lsv.Trades_students_2012

      add datetime date label='datetime';

Proc sql;

update lsv.Trades_students_2012

      set datetime= dhms(date,hour(time),minute(time), second(time));

Is it right ? Could you please help me with all this ?

Ksharp
Super User

You don't post your real data. I just guess something . you can wrap all these SQL statement into one proc sql and use alias to make code succinct . 

proc sql;

   alter table lsv.Trades_students_2012

      add datetime date label='datetime';

update lsv.Trades_students_2012

      set datetime= dhms(date,hour(time),minute(time), second(time));

create table want as

select a.*,bid, ask

  from lsv.trades_students_2012 as a ,lsv.orders_students_2012 as b

   where  a.Datetime-b.Time_OB ge 0

    group by id_trades

     having min(a.Datetime-b.Time_OB)=a.Datetime-b.Datetime ;

quit;

Xia Keshan

crikriek
Calcite | Level 5

Thanks a lot !

Nevetheless my two datasets are huge (3GB and 2GB) so the procedure takes too much time and space.

Is there any other way to do it ?

gergely_batho
SAS Employee

Data step is fast if your dataset is sorted by datetime. You can merge/interleave the 2 datasets.

The problem is, that you cannot use a merge/by statement, because time stamps are not exactly equal.

You cannot use a set/by statement, because in that case when we read from one of the datasets, the information from the other dataset is lost.

The solution is:

data want2;

  set y(in=inY keep=datetime) x(in=inX keep=datetime);

  by datetime;

  if inY then do;

          set y;

  end;

  if inX then do;

       set x;

       output;

  end;

run;

crikriek
Calcite | Level 5

Thank you Gergely.

I'm thinking about the SyncJoin algorithm. I read this is a faster way to operate faster the cartesian product. Unfortunately I don't know how to apply it in my case.

gergely_batho
SAS Employee

I have looked at the paper describing SyncJoin. If you have a many-to-many relationship, my code won’t work. If it’s one-to-many (or many-to-one) I believe it is at least as fast as SyncJoin.

In the SAS log, you will see, that data step reads each data set 2 times. But in practice we make one-pass through the data. The data is still in the cache, when we re-read it. There is a version of this code that avoids this “apparent re-read”, but it’s complicated.

Have you tried it? You just need to change the names of the tables (x and y to lsv....).

I've used this technique in several projects on millions of rows.

Ksharp
Super User

the following could help you understand what Gergely's code was trying to achieve. Of course, your data must be sorted already. or try Hash Table.

data x;
input id_trade     Datetime & anydtdtm.;
format      Datetime Datetime.;
cards;
320      1/02/2014 09:00
240      1/02/2014 09:03
356      3/02/2014 09:12
908      4/02/2014 09:15
1452      5/02/2014 09:23
6543      6/02/2014 09:45
;
run;
data y;
input Datetime & anydtdtm.     (Bid     Ask     ) (: commax.);
format      Datetime Datetime. Bid commax12.2 ;
cards;
1/02/2014 09:00      12     13
1/02/2014 09:02      14     15
3/02/2014 09:10      13     14
4/02/2014 09:11      13,5     14
5/02/2014 09:24      12,2     13
6/02/2014 09:35      11,3     14
;
run;
data want(drop=bid ask);
 set y x(in=inx);
 by Datetime;
 retain _bid _ask;
 if not missing(bid) then _bid=bid;
 if not missing(ask) then _ask=ask;
if inx;
run;

Xia Keshan

crikriek
Calcite | Level 5

Thanks  a lot guys  it worked perfectly !! Smiley Happy

Ksharp
Super User

Good to know . That is the reason why I love this place, you can still learn something new or good idea. Smiley Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1400 views
  • 3 likes
  • 4 in conversation