BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AbuChowdhury
Fluorite | Level 6

Dear Experts,

 

I have two datasets which I attached: one is trade_data and another is announce_data. I need to creat a new dataset as following: if the tradedate of a cusip (i.e. firm) in the "trade_data" is within ±1 day of the announce date in the "announce_data" file, then the observation of that tradeday of that firm will have to be removed. For example, a firm has an announce date on 16 november 2004 in "announce_data". If that firm has a trade date either on 15 november 2004 or 16 november 2004 or 17 november 2004 then the observation of that trade date will have to be removed.  

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

 

data ANNOUNCE;
input CUSIP ANNOUNCE_DATE : ddmmyy10.;
cards;
00036110 08/01/2004
00036110 29/06/2004
00036110 22/09/2004
00036110 17/12/2004
00036110 16/03/2005
run;

data TRADE;
input TRADE_DATE ddmmyy10. YEAR CUSIP ;
cards;
05/01/2004 2004 00036110
05/01/2004 2004 00036110
06/01/2004 2004 00036110
07/01/2004 2004 00036110
23/01/2004 2004 00036110
run;

data WANT;
  set TRADE;
  if _N_=1 then do;
    dcl hash ANNOUNCE(dataset:'ANNOUNCE');
    ANNOUNCE.definekey('CUSIP','ANNOUNCE_DATE');
    ANNOUNCE.definedone();
    call missing (ANNOUNCE_DATE);
  end;
  if ANNOUNCE.check(key:CUSIP, key:TRADE_DATE-1) =0 
   | ANNOUNCE.check(key:CUSIP, key:TRADE_DATE  ) =0 
   | ANNOUNCE.check(key:CUSIP, key:TRADE_DATE+1) =0 then delete;
run;

View solution in original post

17 REPLIES 17
ChrisNZ
Tourmaline | Level 20

Like this?

 

data ANNOUNCE;
input CUSIP ANNOUNCE_DATE : ddmmyy10.;
cards;
00036110 08/01/2004
00036110 29/06/2004
00036110 22/09/2004
00036110 17/12/2004
00036110 16/03/2005
run;

data TRADE;
input TRADE_DATE ddmmyy10. YEAR CUSIP ;
cards;
05/01/2004 2004 00036110
05/01/2004 2004 00036110
06/01/2004 2004 00036110
07/01/2004 2004 00036110
23/01/2004 2004 00036110
run;

data WANT;
  set TRADE;
  if _N_=1 then do;
    dcl hash ANNOUNCE(dataset:'ANNOUNCE');
    ANNOUNCE.definekey('CUSIP','ANNOUNCE_DATE');
    ANNOUNCE.definedone();
    call missing (ANNOUNCE_DATE);
  end;
  if ANNOUNCE.check(key:CUSIP, key:TRADE_DATE-1) =0 
   | ANNOUNCE.check(key:CUSIP, key:TRADE_DATE  ) =0 
   | ANNOUNCE.check(key:CUSIP, key:TRADE_DATE+1) =0 then delete;
run;

AbuChowdhury
Fluorite | Level 6

Thanks a lot. I think it works. By the by, is it possible to solve by using proc sql?

ChrisNZ
Tourmaline | Level 20

Like this?

 


proc sql;
  select *    
  from TRADE
  where not exists(select * 
                   from ANNOUNCE 
                   where TRADE.CUSIP = ANNOUNCE.CUSIP
                     and TRADE_DATE-1 <= ANNOUNCE_DATE <= TRADE_DATE+1 );
quit;

 

AbuChowdhury
Fluorite | Level 6

Hi,

SAS takes very long time if I use the proc sql. Even I needed to terminate the current process because SAS stopped working.

ChrisNZ
Tourmaline | Level 20

Yes the exists function is really slow in SAS, and it shows on large tables.

Hash tables are much faster.

So have your requirements changed? Is it now trading days and not calendar days?

If so you need a list of tradings days, your TRADE table doesn't have it.

ChrisNZ
Tourmaline | Level 20

A left join should be faster than exists()


proc sql;
  select TRADE.CUSIP, TRADE_DATE
  from TRADE
    left join
       ANNOUNCE
    on TRADE.CUSIP = ANNOUNCE.CUSIP
  group by 1, 2
  having  min(abs(sum(TRADE_DATE,-ANNOUNCE_DATE))) >1;
quit;

AbuChowdhury
Fluorite | Level 6

My trade table has trading days (the variable tradedate), not calendar days. So are your codes based on calendar days? My requirement was: if the tradedate in the Trade table is within ±1 calendar days of the announcement date in the Announce table then that traddate in the Trade table will have to removed.

mkeintz
PROC Star

It's written as two steps, but it's really only one pass of the data since ANN_WINDOW is a data seet VIEW, not a data set FILE:

 

/* Make a data set view of trade dates that should be excluded*/
data ann_window/view=ann_window;
  set ann;
  do trade_date=announce_date-1 to announce_date+1;
    output;
  end;
run;

data want;
  merge trades ann_window (in=inw);
  by cusip trade_date;
  if inw=0;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
AbuChowdhury
Fluorite | Level 6

Hi,

For the second datastep in your first reply, merging does not work since the two datasets need to be sorted at first. Since ann_window is a view table it cannot be sorted. So what can be done?

mkeintz
PROC Star

Your annoncements data set is not likely to be large, so sort it before making the windows.  I assume your trades dataset is already sorted by cusip/date.

 

MK

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Neither the PRoC SQL nor the hash solution took advantage of the sorted order of your data sets, which is why I  offered the solution using the data set view. 

 

However, a more challenging (and common problem for financial researchers) is to elminate a three-trading-windows centered on the announcment date, rather than a three calendar-day solutions.   this wold make the sql and hash a good deal more complicated, but the data-set-view technique holds up.  The main change below is the second SET statement in the DATA WANT step.  It has the FIRSTOBS=2 parement, used to look ahead one record, which in turn allows determination of whether the record-in-hand is one tradyiing day prior to an announcement.

 

data ta / view=ta;
  merge Trades  Announce (in=ina rename=(announce_date=trade_date));
  by cusip trade_date;
  in_a=ina;
run;

data want;
  set ta;
  by cusip;

  /* read ahead one record */
  if eof=0 then set ta (firstobs=2 keep=in_a rename=(in_a=next_in_a)) end=eof;
  else next_in_a=0;

  if in_a or
     (lag(in_a) and not(first.cusip)) or 
     (next_in_a and not(last.cusip))
  then delete;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
AbuChowdhury
Fluorite | Level 6

Following errors are shown for those codes:

 

ERROR 22-7: Invalid option name END.

ERROR 6-185: Missing ')' parenthesis for data set option list.

ERROR 79-322: Expecting a ).

mkeintz
PROC Star
Forgot a closing parens. I've edited the program.Try it again.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
AbuChowdhury
Fluorite | Level 6

Now it shows the following error:

 

ERROR: File WORK.VTEMP.DATA does not exist.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 4074 views
  • 1 like
  • 3 in conversation