Solved: Re: Merging Datasets for Each ID

finans_sas · Posted 11-27-2011 09:04 PM

Dear SAS Community Members,

I have been having hard time merging the following two datasets:

data trans;

informat date yymmdd10.;

input transaction_id stock_id date event_day;

format date yymmdd10.;

cards;

1 100 20090909 1

2 100 20090909 1

3 101 20090909 1

;

run;

data returns;

informat date yymmdd10.;

input stock_id date return;

format date yymmdd10.;

cards;

100 20090908 2.12

100 20090909 2.14

100 20090910 0.04

101 20090908 -0.87

101 20090909 -0.96

101 20090910 2.33

;

run;

proc sort data=trans; by stock_id date; run;

proc sort data=returns; by stock_id date; run;

data combined;

merge trans returns; by stock_id date; run;

*********************************************************************************************;

I want the dataset "combined" to look like the following:

data stock_id transaction_id event_day return

2009-09-08 100 1 . 2.12

2009-09-09 100 1 1 2.14

2009-09-10 100 1 . 0.04

2009-09-08 100 2 . 2.12

2009-09-09 100 2 1 2.14

2009-09-10 100 2 . 0.04

2009-09-08 101 3 . -0.87

2009-09-09 101 3 1 -0.96

2009-09-10 101 3 . 2.33

To summarize, I would like to combine these two datasets for each transaction_id by stock_id and date. In the original dataset, "trans" is composed of 65,000 and "returns" is composed of 8 million observations. So far my attempts have not been successful in combining "trans" and "returns" the way I wanted. I would very much appreciate your help solve this problem.

Thanks in advance

Best Regards

art297 · Posted 11-27-2011 11:49 PM

How do the records with the dates 2009-09-08 and 2009-09-10 end up with transaction ids? I think you have to explain why you expect the results shown.

The following produces the file you indicated you want, but I didn't bother to tighten the code as I'm not sure if it is doing what you want:

proc sql;

create table need as

select a.transaction_id,a.stock_id,

b.date,b.return

from trans a,returns b

having a.stock_id=b.stock_id

;

create table want as

select * from need a

left join trans b

on a.stock_id=b.stock_id and

a.transaction_id=b.transaction_id and

a.date=b.date

order by a.transaction_id,a.stock_id,a.date

;

quit;

View solution in original post

art297 · Posted 11-27-2011 11:49 PM

How do the records with the dates 2009-09-08 and 2009-09-10 end up with transaction ids? I think you have to explain why you expect the results shown.

The following produces the file you indicated you want, but I didn't bother to tighten the code as I'm not sure if it is doing what you want:

proc sql;

create table need as

select a.transaction_id,a.stock_id,

b.date,b.return

from trans a,returns b

having a.stock_id=b.stock_id

;

create table want as

select * from need a

left join trans b

on a.stock_id=b.stock_id and

a.transaction_id=b.transaction_id and

a.date=b.date

order by a.transaction_id,a.stock_id,a.date

;

quit;

finans_sas · Posted 11-28-2011 12:38 AM

Thank you art297 for helping me out. transaction_ids represent a single trade executed by a trader. Unfortunately, the two different traders may execute the same trade (invest in the same company, same stock_id) on the same day and in case this happens, I indicate the second trade with a separate transaction_id even though the date and stock_id are the same for both transaction.

Therefore, all I want is to treat each transaction separately and calculate the returns for each transaction around day it took place (event day). Therefore, I need a merged file in which the transactions and returns are merged based on stock ids and date for each transaction_id. As of now, I am not quite sure how to get the output I want. Did I make my point any clearer? Please let me know if ther are still any question marks left. Thank you for the code. I will try your suggestion and let you know. Best Regards.

Ksharp · Posted 11-28-2011 12:39 AM

I want to know what you got to do if 2009-09-09 has multiple value of event_day?

Suppose every day has only one event_day.

data trans;
informat date yymmdd10.;
input transaction_id stock_id  date event_day;
format date yymmdd10.;
cards;
1 100 20090909 1
2 100 20090909 1
3 101 20090909 1
;
run;

 

data returns;
informat date yymmdd10.;
input stock_id  date return;
format date yymmdd10.;
cards;
100 20090908 2.12
100 20090909 2.14
100 20090910 0.04
101 20090908 -0.87
101 20090909  -0.96
101 20090910 2.33
;
run;

data want(drop= rc rx);
 if _n_ eq 1 then do; 
  if 0 then set trans;
   declare hash ha(hashexp:20,dataset:'trans',multidata:'Y');
    ha.definekey('stock_id');
    ha.definedata('transaction_id','stock_id');
    ha.definedone();
  
   declare hash h(hashexp:20,dataset:'trans');
    h.definekey('date');
    h.definedata('event_day');
    h.definedone();
 end;
set returns;
 call missing(transaction_id,event_day);
 rx=h.find(); rc=ha.find();
 if rc=0 then do;
               output;
               do while(ha.find_next()=0);
                output;
               end;
             end;
run;


proc sort data=want; by  transaction_id date ;run;

Ksharp

Message was edited by: xia keshan

Linlin · Posted 11-28-2011 08:47 AM

proc sql;

create table want as select a.* ,b.transaction_id, b.date as date1,b.event_day

from returns a,trans b

where a.stock_id=b.stock_id

order by 4,2,1;

quit;

data want(drop=date1);

set want;

event_day=ifn(date eq date1,event_day,.);

run;

Tom · Posted 11-28-2011 09:43 AM

You can use CASE to create the desired output in a single SQL statement.

proc sql;

create table want as

select a.date

, a.stock_id

, b.transaction_id

, case when b.date = a.date then b.event_day else . end

as event_day

, a.return

from returns a,trans b

where a.stock_id=b.stock_id

order by 3,2,1;

quit;

Linlin · Posted 11-28-2011 10:06 AM

Thank you Tom!

finans_sas · Posted 11-28-2011 12:34 PM

Dear All,

I am very thankful to each of you for helping me out with the problem I was having. All of your suggestions do what I need. I accepted the first solution as the correct answer since I only had one right to give. This is such a great place to learn and share SAS programming.

Best Regards

(ps: Ksharp, I only have one event_day for each transaction. You were right in supposing that every day has only one event_day. Thank you for your kind help).

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away