Dear SAS Community Members,
I have been having hard time merging the following two datasets:
data trans;
informat date yymmdd10.;
input transaction_id stock_id date event_day;
format date yymmdd10.;
cards;
1 100 20090909 1
2 100 20090909 1
3 101 20090909 1
;
run;
data returns;
informat date yymmdd10.;
input stock_id date return;
format date yymmdd10.;
cards;
100 20090908 2.12
100 20090909 2.14
100 20090910 0.04
101 20090908 -0.87
101 20090909 -0.96
101 20090910 2.33
;
run;
proc sort data=trans; by stock_id date; run;
proc sort data=returns; by stock_id date; run;
data combined;
merge trans returns; by stock_id date; run;
*********************************************************************************************;
I want the dataset "combined" to look like the following:
data stock_id transaction_id event_day return
2009-09-08 100 1 . 2.12
2009-09-09 100 1 1 2.14
2009-09-10 100 1 . 0.04
2009-09-08 100 2 . 2.12
2009-09-09 100 2 1 2.14
2009-09-10 100 2 . 0.04
2009-09-08 101 3 . -0.87
2009-09-09 101 3 1 -0.96
2009-09-10 101 3 . 2.33
To summarize, I would like to combine these two datasets for each transaction_id by stock_id and date. In the original dataset, "trans" is composed of 65,000 and "returns" is composed of 8 million observations. So far my attempts have not been successful in combining "trans" and "returns" the way I wanted. I would very much appreciate your help solve this problem.
Thanks in advance
Best Regards
How do the records with the dates 2009-09-08 and 2009-09-10 end up with transaction ids? I think you have to explain why you expect the results shown.
The following produces the file you indicated you want, but I didn't bother to tighten the code as I'm not sure if it is doing what you want:
proc sql;
create table need as
select a.transaction_id,a.stock_id,
b.date,b.return
from trans a,returns b
having a.stock_id=b.stock_id
;
create table want as
select * from need a
left join trans b
on a.stock_id=b.stock_id and
a.transaction_id=b.transaction_id and
a.date=b.date
order by a.transaction_id,a.stock_id,a.date
;
quit;
How do the records with the dates 2009-09-08 and 2009-09-10 end up with transaction ids? I think you have to explain why you expect the results shown.
The following produces the file you indicated you want, but I didn't bother to tighten the code as I'm not sure if it is doing what you want:
proc sql;
create table need as
select a.transaction_id,a.stock_id,
b.date,b.return
from trans a,returns b
having a.stock_id=b.stock_id
;
create table want as
select * from need a
left join trans b
on a.stock_id=b.stock_id and
a.transaction_id=b.transaction_id and
a.date=b.date
order by a.transaction_id,a.stock_id,a.date
;
quit;
Thank you art297 for helping me out. transaction_ids represent a single trade executed by a trader. Unfortunately, the two different traders may execute the same trade (invest in the same company, same stock_id) on the same day and in case this happens, I indicate the second trade with a separate transaction_id even though the date and stock_id are the same for both transaction.
Therefore, all I want is to treat each transaction separately and calculate the returns for each transaction around day it took place (event day). Therefore, I need a merged file in which the transactions and returns are merged based on stock ids and date for each transaction_id. As of now, I am not quite sure how to get the output I want. Did I make my point any clearer? Please let me know if ther are still any question marks left. Thank you for the code. I will try your suggestion and let you know. Best Regards.
I want to know what you got to do if 2009-09-09 has multiple value of event_day?
Suppose every day has only one event_day.
data trans; informat date yymmdd10.; input transaction_id stock_id date event_day; format date yymmdd10.; cards; 1 100 20090909 1 2 100 20090909 1 3 101 20090909 1 ; run; data returns; informat date yymmdd10.; input stock_id date return; format date yymmdd10.; cards; 100 20090908 2.12 100 20090909 2.14 100 20090910 0.04 101 20090908 -0.87 101 20090909 -0.96 101 20090910 2.33 ; run; data want(drop= rc rx); if _n_ eq 1 then do; if 0 then set trans; declare hash ha(hashexp:20,dataset:'trans',multidata:'Y'); ha.definekey('stock_id'); ha.definedata('transaction_id','stock_id'); ha.definedone(); declare hash h(hashexp:20,dataset:'trans'); h.definekey('date'); h.definedata('event_day'); h.definedone(); end; set returns; call missing(transaction_id,event_day); rx=h.find(); rc=ha.find(); if rc=0 then do; output; do while(ha.find_next()=0); output; end; end; run; proc sort data=want; by transaction_id date ;run;
Ksharp
Message was edited by: xia keshan
proc sql;
create table want as select a.* ,b.transaction_id, b.date as date1,b.event_day
from returns a,trans b
where a.stock_id=b.stock_id
order by 4,2,1;
quit;
data want(drop=date1);
set want;
event_day=ifn(date eq date1,event_day,.);
run;
You can use CASE to create the desired output in a single SQL statement.
proc sql;
create table want as
select a.date
, a.stock_id
, b.transaction_id
, case when b.date = a.date then b.event_day else . end
as event_day
, a.return
from returns a,trans b
where a.stock_id=b.stock_id
order by 3,2,1;
quit;
Thank you Tom!
Dear All,
I am very thankful to each of you for helping me out with the problem I was having. All of your suggestions do what I need. I accepted the first solution as the correct answer since I only had one right to give. This is such a great place to learn and share SAS programming.
Best Regards
(ps: Ksharp, I only have one event_day for each transaction. You were right in supposing that every day has only one event_day. Thank you for your kind help).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.