DATA Step, Macro, Functions and more

Working with Two Datasets

Accepted Solution Solved
Reply
Regular Contributor
Posts: 183
Accepted Solution

Working with Two Datasets

[ Edited ]

Dear Experts,

 

I have two datasets which I attached: one is trade_data and another is announce_data. I need to creat a new dataset as following: if the tradedate of a cusip (i.e. firm) in the "trade_data" is within ±1 day of the announce date in the "announce_data" file, then the observation of that tradeday of that firm will have to be removed. For example, a firm has an announce date on 16 november 2004 in "announce_data". If that firm has a trade date either on 15 november 2004 or 16 november 2004 or 17 november 2004 then the observation of that trade date will have to be removed.  


Accepted Solutions
Solution
‎11-21-2016 01:26 PM
PROC Star
Posts: 1,561

Re: Working with Two Datasets

Like this?

 

data ANNOUNCE;
input CUSIP ANNOUNCE_DATE : ddmmyy10.;
cards;
00036110 08/01/2004
00036110 29/06/2004
00036110 22/09/2004
00036110 17/12/2004
00036110 16/03/2005
run;

data TRADE;
input TRADE_DATE ddmmyy10. YEAR CUSIP ;
cards;
05/01/2004 2004 00036110
05/01/2004 2004 00036110
06/01/2004 2004 00036110
07/01/2004 2004 00036110
23/01/2004 2004 00036110
run;

data WANT;
  set TRADE;
  if _N_=1 then do;
    dcl hash ANNOUNCE(dataset:'ANNOUNCE');
    ANNOUNCE.definekey('CUSIP','ANNOUNCE_DATE');
    ANNOUNCE.definedone();
    call missing (ANNOUNCE_DATE);
  end;
  if ANNOUNCE.check(key:CUSIP, key:TRADE_DATE-1) =0 
   | ANNOUNCE.check(key:CUSIP, key:TRADE_DATE  ) =0 
   | ANNOUNCE.check(key:CUSIP, key:TRADE_DATE+1) =0 then delete;
run;

View solution in original post


All Replies
Solution
‎11-21-2016 01:26 PM
PROC Star
Posts: 1,561

Re: Working with Two Datasets

Like this?

 

data ANNOUNCE;
input CUSIP ANNOUNCE_DATE : ddmmyy10.;
cards;
00036110 08/01/2004
00036110 29/06/2004
00036110 22/09/2004
00036110 17/12/2004
00036110 16/03/2005
run;

data TRADE;
input TRADE_DATE ddmmyy10. YEAR CUSIP ;
cards;
05/01/2004 2004 00036110
05/01/2004 2004 00036110
06/01/2004 2004 00036110
07/01/2004 2004 00036110
23/01/2004 2004 00036110
run;

data WANT;
  set TRADE;
  if _N_=1 then do;
    dcl hash ANNOUNCE(dataset:'ANNOUNCE');
    ANNOUNCE.definekey('CUSIP','ANNOUNCE_DATE');
    ANNOUNCE.definedone();
    call missing (ANNOUNCE_DATE);
  end;
  if ANNOUNCE.check(key:CUSIP, key:TRADE_DATE-1) =0 
   | ANNOUNCE.check(key:CUSIP, key:TRADE_DATE  ) =0 
   | ANNOUNCE.check(key:CUSIP, key:TRADE_DATE+1) =0 then delete;
run;

Regular Contributor
Posts: 183

Re: Working with Two Datasets

Thanks a lot. I think it works. By the by, is it possible to solve by using proc sql?

PROC Star
Posts: 1,561

Re: Working with Two Datasets

Like this?

 


proc sql;
  select *    
  from TRADE
  where not exists(select * 
                   from ANNOUNCE 
                   where TRADE.CUSIP = ANNOUNCE.CUSIP
                     and TRADE_DATE-1 <= ANNOUNCE_DATE <= TRADE_DATE+1 );
quit;

 

Regular Contributor
Posts: 183

Re: Working with Two Datasets

Hi,

SAS takes very long time if I use the proc sql. Even I needed to terminate the current process because SAS stopped working.

PROC Star
Posts: 1,561

Re: Working with Two Datasets

Yes the exists function is really slow in SAS, and it shows on large tables.

Hash tables are much faster.

So have your requirements changed? Is it now trading days and not calendar days?

If so you need a list of tradings days, your TRADE table doesn't have it.

PROC Star
Posts: 1,561

Re: Working with Two Datasets

A left join should be faster than exists()


proc sql;
  select TRADE.CUSIP, TRADE_DATE
  from TRADE
    left join
       ANNOUNCE
    on TRADE.CUSIP = ANNOUNCE.CUSIP
  group by 1, 2
  having  min(abs(sum(TRADE_DATE,-ANNOUNCE_DATE))) >1;
quit;

Regular Contributor
Posts: 183

Re: Working with Two Datasets

My trade table has trading days (the variable tradedate), not calendar days. So are your codes based on calendar days? My requirement was: if the tradedate in the Trade table is within ±1 calendar days of the announcement date in the Announce table then that traddate in the Trade table will have to removed.

Valued Guide
Posts: 797

Re: Working with Two Datasets

It's written as two steps, but it's really only one pass of the data since ANN_WINDOW is a data seet VIEW, not a data set FILE:

 

/* Make a data set view of trade dates that should be excluded*/
data ann_window/view=ann_window;
  set ann;
  do trade_date=announce_date-1 to announce_date+1;
    output;
  end;
run;

data want;
  merge trades ann_window (in=inw);
  by cusip trade_date;
  if inw=0;
run;
Regular Contributor
Posts: 183

Re: Working with Two Datasets

Hi,

For the second datastep in your first reply, merging does not work since the two datasets need to be sorted at first. Since ann_window is a view table it cannot be sorted. So what can be done?

Valued Guide
Posts: 797

Re: Working with Two Datasets

[ Edited ]

Your annoncements data set is not likely to be large, so sort it before making the windows.  I assume your trades dataset is already sorted by cusip/date.

 

MK

Valued Guide
Posts: 797

Re: Working with Two Datasets

[ Edited ]

Neither the PRoC SQL nor the hash solution took advantage of the sorted order of your data sets, which is why I  offered the solution using the data set view. 

 

However, a more challenging (and common problem for financial researchers) is to elminate a three-trading-windows centered on the announcment date, rather than a three calendar-day solutions.   this wold make the sql and hash a good deal more complicated, but the data-set-view technique holds up.  The main change below is the second SET statement in the DATA WANT step.  It has the FIRSTOBS=2 parement, used to look ahead one record, which in turn allows determination of whether the record-in-hand is one tradyiing day prior to an announcement.

 

data ta / view=ta;
  merge Trades  Announce (in=ina rename=(announce_date=trade_date));
  by cusip trade_date;
  in_a=ina;
run;

data want;
  set ta;
  by cusip;

  /* read ahead one record */
  if eof=0 then set ta (firstobs=2 keep=in_a rename=(in_a=next_in_a)) end=eof;
  else next_in_a=0;

  if in_a or
     (lag(in_a) and not(first.cusip)) or 
     (next_in_a and not(last.cusip))
  then delete;
run;
Regular Contributor
Posts: 183

Re: Working with Two Datasets

Following errors are shown for those codes:

 

ERROR 22-7: Invalid option name END.

ERROR 6-185: Missing ')' parenthesis for data set option list.

ERROR 79-322: Expecting a ).

Valued Guide
Posts: 797

Re: Working with Two Datasets

Forgot a closing parens. I've edited the program.Try it again.
Regular Contributor
Posts: 183

Re: Working with Two Datasets

Now it shows the following error:

 

ERROR: File WORK.VTEMP.DATA does not exist.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 1391 views
  • 1 like
  • 3 in conversation