BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
finans_sas
Quartz | Level 8

Dear SAS Community Members,

I have been having hard time merging the following two datasets:

data trans;

informat date yymmdd10.;

input transaction_id stock_id  date event_day;

format date yymmdd10.;

cards;

1 100 20090909 1

2 100 20090909 1

3 101 20090909 1

;

run;

data returns;

informat date yymmdd10.;

input stock_id  date return;

format date yymmdd10.;

cards;

100 20090908 2.12

100 20090909 2.14

100 20090910 0.04

101 20090908 -0.87

101 20090909  -0.96

101 20090910 2.33

;

run;

proc sort data=trans; by stock_id date; run;

proc sort data=returns; by stock_id date; run;

data combined;

merge trans returns; by stock_id date; run;

*********************************************************************************************;

I want the dataset "combined" to look like the following:

data               stock_id       transaction_id    event_day   return

2009-09-08     100                      1                    .            2.12

2009-09-09     100                      1                    1            2.14

2009-09-10     100                      1                    .            0.04

2009-09-08     100                      2                    .            2.12

2009-09-09     100                      2                    1            2.14

2009-09-10     100                      2                    .            0.04

2009-09-08     101                      3                    .            -0.87

2009-09-09     101                      3                    1            -0.96

2009-09-10     101                      3                    .            2.33

To summarize, I would like to combine these two datasets for each transaction_id by stock_id and date. In the original dataset, "trans" is composed of 65,000 and "returns" is composed of 8 million observations. So far my attempts have not been successful in combining "trans" and "returns" the way I wanted. I would very much appreciate your help solve this problem.

Thanks in advance

Best Regards

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

How do the records with the dates 2009-09-08 and 2009-09-10 end up with transaction ids?  I think you have to explain why you expect the results shown.

The following produces the file you indicated you want, but I didn't bother to tighten the code as I'm not sure if it is doing what you want:

proc sql;

  create table need  as

    select a.transaction_id,a.stock_id,

           b.date,b.return

      from trans a,returns  b

          having a.stock_id=b.stock_id

  ;

  create table want as

    select * from need a

      left join trans b

              on a.stock_id=b.stock_id and

           a.transaction_id=b.transaction_id and

           a.date=b.date

          order by a.transaction_id,a.stock_id,a.date

  ;

quit;

View solution in original post

7 REPLIES 7
art297
Opal | Level 21

How do the records with the dates 2009-09-08 and 2009-09-10 end up with transaction ids?  I think you have to explain why you expect the results shown.

The following produces the file you indicated you want, but I didn't bother to tighten the code as I'm not sure if it is doing what you want:

proc sql;

  create table need  as

    select a.transaction_id,a.stock_id,

           b.date,b.return

      from trans a,returns  b

          having a.stock_id=b.stock_id

  ;

  create table want as

    select * from need a

      left join trans b

              on a.stock_id=b.stock_id and

           a.transaction_id=b.transaction_id and

           a.date=b.date

          order by a.transaction_id,a.stock_id,a.date

  ;

quit;

finans_sas
Quartz | Level 8

Thank you art297 for helping me out. transaction_ids represent a single trade executed by a trader. Unfortunately, the two different traders may execute the same trade (invest in the same company, same stock_id) on the same day and in case this happens, I indicate the second trade with a separate transaction_id even though the date and stock_id are the same for both transaction.

Therefore, all I want is to treat each transaction separately and calculate the returns for each transaction around day it took place (event day). Therefore, I need a merged file in which the transactions and returns are merged based on stock ids and date for each transaction_id. As of now, I am not quite sure how to get the output I want. Did I make my point any clearer? Please let me know if ther are still any question marks left. Thank you for the code. I will try your suggestion and let you know. Best Regards.

Ksharp
Super User

I want to know what you got to do if 2009-09-09  has multiple value of event_day?

Suppose every day has only one event_day.

data trans;
informat date yymmdd10.;
input transaction_id stock_id  date event_day;
format date yymmdd10.;
cards;
1 100 20090909 1
2 100 20090909 1
3 101 20090909 1
;
run;

 

data returns;
informat date yymmdd10.;
input stock_id  date return;
format date yymmdd10.;
cards;
100 20090908 2.12
100 20090909 2.14
100 20090910 0.04
101 20090908 -0.87
101 20090909  -0.96
101 20090910 2.33
;
run;

data want(drop= rc rx);
 if _n_ eq 1 then do; 
  if 0 then set trans;
   declare hash ha(hashexp:20,dataset:'trans',multidata:'Y');
    ha.definekey('stock_id');
    ha.definedata('transaction_id','stock_id');
    ha.definedone();
  
   declare hash h(hashexp:20,dataset:'trans');
    h.definekey('date');
    h.definedata('event_day');
    h.definedone();
 end;
set returns;
 call missing(transaction_id,event_day);
 rx=h.find(); rc=ha.find();
 if rc=0 then do;
               output;
               do while(ha.find_next()=0);
                output;
               end;
             end;
run;


proc sort data=want; by  transaction_id date ;run;

Ksharp

Message was edited by: xia keshan

Linlin
Lapis Lazuli | Level 10

proc sql;

create table want as select a.* ,b.transaction_id, b.date as date1,b.event_day

                 from returns a,trans b

                         where a.stock_id=b.stock_id

                         order by 4,2,1;

quit;

data want(drop=date1);

    set want;

      event_day=ifn(date eq date1,event_day,.);

run;

Tom
Super User Tom
Super User

You can use CASE to create the desired output in a single SQL statement.

proc sql;

  create table want as

    select a.date

         , a.stock_id

         , b.transaction_id

         , case when b.date = a.date then b.event_day else . end

           as event_day

         , a.return

    from returns a,trans b

    where a.stock_id=b.stock_id

    order by 3,2,1;

quit;

Linlin
Lapis Lazuli | Level 10

Thank you Tom!

finans_sas
Quartz | Level 8

Dear All,

I am very thankful to each of you for helping me out  with the problem I was having. All of your suggestions do what I need. I accepted the first solution as the correct answer since I only had one right to give. This is such a great place to learn and share SAS programming.

Best Regards

(ps: Ksharp, I only have one event_day for each transaction. You were right in supposing that every day has only one event_day. Thank you for your kind help).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1373 views
  • 7 likes
  • 5 in conversation