Hello everyone,
I'm currently writing my thesis and I'm facing a serious issue. It would be awesome if you could help me with it.
I want to merge two database. The first one (let's call it TRADES) concerns the trades concerning different stocks, this one goes like that:
id_trade | Datetime |
320 | 1/02/2014 09:00 |
240 | 1/02/2014 09:03 |
356 | 3/02/2014 09:12 |
908 | 4/02/2014 09:15 |
1452 | 5/02/2014 09:23 |
6543 | 6/02/2014 09:45 |
The second (Let's call it SPREAD) database is like that
Datetime | Bid | Ask |
1/02/2014 09:00 | 12 | 13 |
1/02/2014 09:02 | 14 | 15 |
3/02/2014 09:10 | 13 | 14 |
4/02/2014 09:11 | 13,5 | 14 |
5/02/2014 09:24 | 12,2 | 13 |
6/02/2014 09:35 | 11,3 | 14 |
What I would like is to merge these database in order to obtain the last BID and ASK values just before the trade. This would give this
id_trade | Datetime | Bid | Ask |
320 | 1/02/2014 09:00 | 12 | 13 |
240 | 1/02/2014 09:03 | 14 | 15 |
356 | 3/02/2014 09:12 | 13,5 | 14 |
908 | 4/02/2014 09:15 | 13,5 | 14 |
1452 | 5/02/2014 09:23 | 13,5 | 14 |
6543 | 6/02/2014 09:45 | 11,3 | 14 |
I've been thinking a lot about this merge but I don't see how to do it. Could you please help me with it ?
Thanks a lot
If I guess right, You are coming from Euro ? and If your requirement is not too high,SQL can achieve that.
data x; input id_trade Datetime & anydtdtm.; format Datetime Datetime.; cards; 320 1/02/2014 09:00 240 1/02/2014 09:03 356 3/02/2014 09:12 908 4/02/2014 09:15 1452 5/02/2014 09:23 6543 6/02/2014 09:45 ; run; data y; input Datetime & anydtdtm. (Bid Ask ) (: commax.); format Datetime Datetime. Bid commax12.2 ; cards; 1/02/2014 09:00 12 13 1/02/2014 09:02 14 15 3/02/2014 09:10 13 14 4/02/2014 09:11 13,5 14 5/02/2014 09:24 12,2 13 6/02/2014 09:35 11,3 14 ; run; proc sql; create table want as select x.*,bid,ask from x,y where x.Datetime-y.Datetime ge 0 group by id_trade having min(x.Datetime-y.Datetime)=x.Datetime-y.Datetime ; quit;
Xia Keshan
crikriek,
You need a unique key on both data sets to perform the merge. I don't know your data, but I imagine something like symbol id_trade should work. If you can get those variables on both files, then you can use code like the following:
proc sort data=trades nodupkey; by symbol id_trade; run; *check log to ensure there are no dups;
proc sort data=spread nodupkey; by symbol id_trade; run; *check log to ensure there are no dups;
data trade_info;
merge trades (in=a) spread (in=b);
by symobl id_trade;
if a;
run;
Code above performs a left outer join. IOW, it keeps all the records from data set a (trades) and attaches information from data set b if the keys match. If you want an inner join which only keeps those records that match then use "if a & b;" instead of "if a;"
-Bill
If I guess right, You are coming from Euro ? and If your requirement is not too high,SQL can achieve that.
data x; input id_trade Datetime & anydtdtm.; format Datetime Datetime.; cards; 320 1/02/2014 09:00 240 1/02/2014 09:03 356 3/02/2014 09:12 908 4/02/2014 09:15 1452 5/02/2014 09:23 6543 6/02/2014 09:45 ; run; data y; input Datetime & anydtdtm. (Bid Ask ) (: commax.); format Datetime Datetime. Bid commax12.2 ; cards; 1/02/2014 09:00 12 13 1/02/2014 09:02 14 15 3/02/2014 09:10 13 14 4/02/2014 09:11 13,5 14 5/02/2014 09:24 12,2 13 6/02/2014 09:35 11,3 14 ; run; proc sql; create table want as select x.*,bid,ask from x,y where x.Datetime-y.Datetime ge 0 group by id_trade having min(x.Datetime-y.Datetime)=x.Datetime-y.Datetime ; quit;
Xia Keshan
Thank you guys. I really appreciate your help.
I think the solution of Xia Keshan best suits my needs. I was actually thinking about smthg like that. (And yes I come from Euro )
Thanks
Xia Keshan,
I'm having trouble with the implementation of the code. I'm very new in SAS programming and I surely do'nt get all the subtilities of the program. Here's what I typed. LSV in my library. Trades_students_2012 is the dataset with the trades information and Orders_students_2012 is the one with the orders information. Time_OB is the name of the datetime in the orders_students_2012 dataset.
proc sql;
create table want as
select lsv.trades_students_2012.*,bid, ask
from lsv.trades_students_2012,lsv.orders_students_2012
where lsv.trades_students_2012.Datetime-lsv.orders_students_2012.Time_OB ge 0
group by id_trades
having min(lsv.trades_students_2012.Datetime-lsv.Orders_students_2012.Time_OB)=lsv.trades_students_2012.Datetime-lsv.orders_students_2012.Datetime ;
quit;
I also have another problem since the column Datetime in the Trades dataset has been created by myself. I had the date and I had the time so I did this
proc sql;
alter table lsv.Trades_students_2012
add datetime date label='datetime';
Proc sql;
update lsv.Trades_students_2012
set datetime= dhms(date,hour(time),minute(time), second(time));
Is it right ? Could you please help me with all this ?
You don't post your real data. I just guess something . you can wrap all these SQL statement into one proc sql and use alias to make code succinct .
proc sql;
alter table lsv.Trades_students_2012
add datetime date label='datetime';
update lsv.Trades_students_2012
set datetime= dhms(date,hour(time),minute(time), second(time));
create table want as
select a.*,bid, ask
from lsv.trades_students_2012 as a ,lsv.orders_students_2012 as b
where a.Datetime-b.Time_OB ge 0
group by id_trades
having min(a.Datetime-b.Time_OB)=a.Datetime-b.Datetime ;
quit;
Xia Keshan
Thanks a lot !
Nevetheless my two datasets are huge (3GB and 2GB) so the procedure takes too much time and space.
Is there any other way to do it ?
Data step is fast if your dataset is sorted by datetime. You can merge/interleave the 2 datasets.
The problem is, that you cannot use a merge/by statement, because time stamps are not exactly equal.
You cannot use a set/by statement, because in that case when we read from one of the datasets, the information from the other dataset is lost.
The solution is:
data want2;
set y(in=inY keep=datetime) x(in=inX keep=datetime);
by datetime;
if inY then do;
set y;
end;
if inX then do;
set x;
output;
end;
run;
Thank you Gergely.
I'm thinking about the SyncJoin algorithm. I read this is a faster way to operate faster the cartesian product. Unfortunately I don't know how to apply it in my case.
I have looked at the paper describing SyncJoin. If you have a many-to-many relationship, my code won’t work. If it’s one-to-many (or many-to-one) I believe it is at least as fast as SyncJoin.
In the SAS log, you will see, that data step reads each data set 2 times. But in practice we make one-pass through the data. The data is still in the cache, when we re-read it. There is a version of this code that avoids this “apparent re-read”, but it’s complicated.
Have you tried it? You just need to change the names of the tables (x and y to lsv....).
I've used this technique in several projects on millions of rows.
the following could help you understand what Gergely's code was trying to achieve. Of course, your data must be sorted already. or try Hash Table.
data x; input id_trade Datetime & anydtdtm.; format Datetime Datetime.; cards; 320 1/02/2014 09:00 240 1/02/2014 09:03 356 3/02/2014 09:12 908 4/02/2014 09:15 1452 5/02/2014 09:23 6543 6/02/2014 09:45 ; run; data y; input Datetime & anydtdtm. (Bid Ask ) (: commax.); format Datetime Datetime. Bid commax12.2 ; cards; 1/02/2014 09:00 12 13 1/02/2014 09:02 14 15 3/02/2014 09:10 13 14 4/02/2014 09:11 13,5 14 5/02/2014 09:24 12,2 13 6/02/2014 09:35 11,3 14 ; run; data want(drop=bid ask); set y x(in=inx); by Datetime; retain _bid _ask; if not missing(bid) then _bid=bid; if not missing(ask) then _ask=ask; if inx; run;
Xia Keshan
Thanks a lot guys it worked perfectly !!
Good to know . That is the reason why I love this place, you can still learn something new or good idea.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.