Help using Base SAS procedures

Merging two datasets with different but similar columns

Accepted Solution Solved
Reply
Contributor
Posts: 26
Accepted Solution

Merging two datasets with different but similar columns

Hello everyone,

I'm currently writing my thesis and I'm facing a serious issue. It would be awesome if you could help me with it.

I want to merge two database. The first one (let's call it TRADES) concerns the trades concerning different stocks, this one goes like that:

id_tradeDatetime
3201/02/2014 09:00
2401/02/2014 09:03
3563/02/2014 09:12
9084/02/2014 09:15
14525/02/2014 09:23
65436/02/2014 09:45

The second (Let's call it SPREAD) database is like that

DatetimeBidAsk
1/02/2014 09:001213
1/02/2014 09:021415
3/02/2014 09:101314
4/02/2014 09:1113,514
5/02/2014 09:2412,213
6/02/2014 09:3511,314

What I would like is to merge these database in order to obtain the last BID and ASK values just before the trade. This would give this

id_tradeDatetimeBidAsk
3201/02/2014 09:001213
2401/02/2014 09:031415
3563/02/2014 09:1213,514
9084/02/2014 09:1513,514
14525/02/2014 09:2313,514
65436/02/2014 09:4511,314

I've been thinking a lot about this merge but I don't see how to do it. Could you please help me with it ?

Thanks a lot


Accepted Solutions
Solution
‎06-29-2014 06:29 AM
Super User
Posts: 10,020

Re: Merging two datasets with different but similar columns

If I guess right, You are coming from Euro ? and If your requirement is not too high,SQL can achieve that.

data x;
input id_trade     Datetime & anydtdtm.;
format      Datetime Datetime.;
cards;
320      1/02/2014 09:00
240      1/02/2014 09:03
356      3/02/2014 09:12
908      4/02/2014 09:15
1452      5/02/2014 09:23
6543      6/02/2014 09:45
;
run;
data y;
input Datetime & anydtdtm.     (Bid     Ask     ) (: commax.);
format      Datetime Datetime. Bid commax12.2 ;
cards;
1/02/2014 09:00      12     13
1/02/2014 09:02      14     15
3/02/2014 09:10      13     14
4/02/2014 09:11      13,5     14
5/02/2014 09:24      12,2     13
6/02/2014 09:35      11,3     14
;
run;
proc sql;
create table want as 
 select x.*,bid,ask
  from x,y
   where  x.Datetime-y.Datetime ge 0
    group by id_trade
     having min(x.Datetime-y.Datetime)=x.Datetime-y.Datetime ;
quit;

Xia Keshan

View solution in original post


All Replies
Frequent Contributor
Posts: 92

Re: Merging two datasets with different but similar columns

crikriek,

You need a unique key on both data sets to perform the merge.  I don't know your data, but I imagine something like symbol id_trade should work.  If you can get those variables on both files, then you can use code like the following:

proc sort data=trades nodupkey; by symbol id_trade; run; *check log to ensure there are no dups;

proc sort data=spread nodupkey; by symbol id_trade; run; *check log to ensure there are no dups;

data trade_info;

merge trades (in=a) spread (in=b);

by symobl id_trade;

if a;

run;

Code above performs a left outer join.  IOW, it keeps all the records from data set a (trades) and attaches information from data set b if the keys match.  If you want an  inner join which only keeps those records that match then use "if a & b;" instead of "if a;"

-Bill

Solution
‎06-29-2014 06:29 AM
Super User
Posts: 10,020

Re: Merging two datasets with different but similar columns

If I guess right, You are coming from Euro ? and If your requirement is not too high,SQL can achieve that.

data x;
input id_trade     Datetime & anydtdtm.;
format      Datetime Datetime.;
cards;
320      1/02/2014 09:00
240      1/02/2014 09:03
356      3/02/2014 09:12
908      4/02/2014 09:15
1452      5/02/2014 09:23
6543      6/02/2014 09:45
;
run;
data y;
input Datetime & anydtdtm.     (Bid     Ask     ) (: commax.);
format      Datetime Datetime. Bid commax12.2 ;
cards;
1/02/2014 09:00      12     13
1/02/2014 09:02      14     15
3/02/2014 09:10      13     14
4/02/2014 09:11      13,5     14
5/02/2014 09:24      12,2     13
6/02/2014 09:35      11,3     14
;
run;
proc sql;
create table want as 
 select x.*,bid,ask
  from x,y
   where  x.Datetime-y.Datetime ge 0
    group by id_trade
     having min(x.Datetime-y.Datetime)=x.Datetime-y.Datetime ;
quit;

Xia Keshan

Contributor
Posts: 26

Re: Merging two datasets with different but similar columns

Thank you guys. I really appreciate your help.

I think the solution of Xia Keshan best suits my needs. I was actually thinking about smthg like that. (And yes I come from Euro Smiley Happy )

Thanks Smiley Happy

Contributor
Posts: 26

Re: Merging two datasets with different but similar columns

Xia Keshan,

I'm having trouble with the implementation of the code. I'm very new in SAS programming and I surely do'nt get all the subtilities of the program. Here's what I typed. LSV in my library. Trades_students_2012 is the dataset with the trades information and Orders_students_2012 is the one with the orders information. Time_OB is the name of the datetime in the orders_students_2012 dataset.

proc sql;

create table want as

select lsv.trades_students_2012.*,bid, ask

  from lsv.trades_students_2012,lsv.orders_students_2012

   where  lsv.trades_students_2012.Datetime-lsv.orders_students_2012.Time_OB ge 0

    group by id_trades

     having min(lsv.trades_students_2012.Datetime-lsv.Orders_students_2012.Time_OB)=lsv.trades_students_2012.Datetime-lsv.orders_students_2012.Datetime ;

quit;

I also have another problem since the column Datetime in the Trades dataset has been created by myself. I had the date and I had the time so I did this

proc sql;

   alter table lsv.Trades_students_2012

      add datetime date label='datetime';

Proc sql;

update lsv.Trades_students_2012

      set datetime= dhms(date,hour(time),minute(time), second(time));

Is it right ? Could you please help me with all this ?

Super User
Posts: 10,020

Re: Merging two datasets with different but similar columns

You don't post your real data. I just guess something . you can wrap all these SQL statement into one proc sql and use alias to make code succinct . 

proc sql;

   alter table lsv.Trades_students_2012

      add datetime date label='datetime';

update lsv.Trades_students_2012

      set datetime= dhms(date,hour(time),minute(time), second(time));

create table want as

select a.*,bid, ask

  from lsv.trades_students_2012 as a ,lsv.orders_students_2012 as b

   where  a.Datetime-b.Time_OB ge 0

    group by id_trades

     having min(a.Datetime-b.Time_OB)=a.Datetime-b.Datetime ;

quit;

Xia Keshan

Contributor
Posts: 26

Re: Merging two datasets with different but similar columns

Thanks a lot !

Nevetheless my two datasets are huge (3GB and 2GB) so the procedure takes too much time and space.

Is there any other way to do it ?

SAS Employee
Posts: 340

Re: Merging two datasets with different but similar columns

Data step is fast if your dataset is sorted by datetime. You can merge/interleave the 2 datasets.

The problem is, that you cannot use a merge/by statement, because time stamps are not exactly equal.

You cannot use a set/by statement, because in that case when we read from one of the datasets, the information from the other dataset is lost.

The solution is:

data want2;

  set y(in=inY keep=datetime) x(in=inX keep=datetime);

  by datetime;

  if inY then do;

          set y;

  end;

  if inX then do;

       set x;

       output;

  end;

run;

Contributor
Posts: 26

Re: Merging two datasets with different but similar columns

Posted in reply to gergely_batho

Thank you Gergely.

I'm thinking about the SyncJoin algorithm. I read this is a faster way to operate faster the cartesian product. Unfortunately I don't know how to apply it in my case.

SAS Employee
Posts: 340

Re: Merging two datasets with different but similar columns

I have looked at the paper describing SyncJoin. If you have a many-to-many relationship, my code won’t work. If it’s one-to-many (or many-to-one) I believe it is at least as fast as SyncJoin.

In the SAS log, you will see, that data step reads each data set 2 times. But in practice we make one-pass through the data. The data is still in the cache, when we re-read it. There is a version of this code that avoids this “apparent re-read”, but it’s complicated.

Have you tried it? You just need to change the names of the tables (x and y to lsv....).

I've used this technique in several projects on millions of rows.

Super User
Posts: 10,020

Re: Merging two datasets with different but similar columns

the following could help you understand what Gergely's code was trying to achieve. Of course, your data must be sorted already. or try Hash Table.

data x;
input id_trade     Datetime & anydtdtm.;
format      Datetime Datetime.;
cards;
320      1/02/2014 09:00
240      1/02/2014 09:03
356      3/02/2014 09:12
908      4/02/2014 09:15
1452      5/02/2014 09:23
6543      6/02/2014 09:45
;
run;
data y;
input Datetime & anydtdtm.     (Bid     Ask     ) (: commax.);
format      Datetime Datetime. Bid commax12.2 ;
cards;
1/02/2014 09:00      12     13
1/02/2014 09:02      14     15
3/02/2014 09:10      13     14
4/02/2014 09:11      13,5     14
5/02/2014 09:24      12,2     13
6/02/2014 09:35      11,3     14
;
run;
data want(drop=bid ask);
 set y x(in=inx);
 by Datetime;
 retain _bid _ask;
 if not missing(bid) then _bid=bid;
 if not missing(ask) then _ask=ask;
if inx;
run;

Xia Keshan

Contributor
Posts: 26

Re: Merging two datasets with different but similar columns

Thanks  a lot guys  it worked perfectly !! Smiley Happy

Super User
Posts: 10,020

Re: Merging two datasets with different but similar columns

Good to know . That is the reason why I love this place, you can still learn something new or good idea. Smiley Happy

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 451 views
  • 3 likes
  • 4 in conversation