Dear Experts,
I have two datasets which I attached: one is trade_data and another is announce_data. I need to creat a new dataset as following: if the tradedate of a cusip (i.e. firm) in the "trade_data" is within ±1 day of the announce date in the "announce_data" file, then the observation of that tradeday of that firm will have to be removed. For example, a firm has an announce date on 16 november 2004 in "announce_data". If that firm has a trade date either on 15 november 2004 or 16 november 2004 or 17 november 2004 then the observation of that trade date will have to be removed.
Like this?
data ANNOUNCE;
input CUSIP ANNOUNCE_DATE : ddmmyy10.;
cards;
00036110 08/01/2004
00036110 29/06/2004
00036110 22/09/2004
00036110 17/12/2004
00036110 16/03/2005
run;
data TRADE;
input TRADE_DATE ddmmyy10. YEAR CUSIP ;
cards;
05/01/2004 2004 00036110
05/01/2004 2004 00036110
06/01/2004 2004 00036110
07/01/2004 2004 00036110
23/01/2004 2004 00036110
run;
data WANT;
set TRADE;
if _N_=1 then do;
dcl hash ANNOUNCE(dataset:'ANNOUNCE');
ANNOUNCE.definekey('CUSIP','ANNOUNCE_DATE');
ANNOUNCE.definedone();
call missing (ANNOUNCE_DATE);
end;
if ANNOUNCE.check(key:CUSIP, key:TRADE_DATE-1) =0
| ANNOUNCE.check(key:CUSIP, key:TRADE_DATE ) =0
| ANNOUNCE.check(key:CUSIP, key:TRADE_DATE+1) =0 then delete;
run;
Like this?
data ANNOUNCE;
input CUSIP ANNOUNCE_DATE : ddmmyy10.;
cards;
00036110 08/01/2004
00036110 29/06/2004
00036110 22/09/2004
00036110 17/12/2004
00036110 16/03/2005
run;
data TRADE;
input TRADE_DATE ddmmyy10. YEAR CUSIP ;
cards;
05/01/2004 2004 00036110
05/01/2004 2004 00036110
06/01/2004 2004 00036110
07/01/2004 2004 00036110
23/01/2004 2004 00036110
run;
data WANT;
set TRADE;
if _N_=1 then do;
dcl hash ANNOUNCE(dataset:'ANNOUNCE');
ANNOUNCE.definekey('CUSIP','ANNOUNCE_DATE');
ANNOUNCE.definedone();
call missing (ANNOUNCE_DATE);
end;
if ANNOUNCE.check(key:CUSIP, key:TRADE_DATE-1) =0
| ANNOUNCE.check(key:CUSIP, key:TRADE_DATE ) =0
| ANNOUNCE.check(key:CUSIP, key:TRADE_DATE+1) =0 then delete;
run;
Thanks a lot. I think it works. By the by, is it possible to solve by using proc sql?
Like this?
proc sql;
select *
from TRADE
where not exists(select *
from ANNOUNCE
where TRADE.CUSIP = ANNOUNCE.CUSIP
and TRADE_DATE-1 <= ANNOUNCE_DATE <= TRADE_DATE+1 );
quit;
Hi,
SAS takes very long time if I use the proc sql. Even I needed to terminate the current process because SAS stopped working.
Yes the exists function is really slow in SAS, and it shows on large tables.
Hash tables are much faster.
So have your requirements changed? Is it now trading days and not calendar days?
If so you need a list of tradings days, your TRADE table doesn't have it.
A left join should be faster than exists()
proc sql;
select TRADE.CUSIP, TRADE_DATE
from TRADE
left join
ANNOUNCE
on TRADE.CUSIP = ANNOUNCE.CUSIP
group by 1, 2
having min(abs(sum(TRADE_DATE,-ANNOUNCE_DATE))) >1;
quit;
My trade table has trading days (the variable tradedate), not calendar days. So are your codes based on calendar days? My requirement was: if the tradedate in the Trade table is within ±1 calendar days of the announcement date in the Announce table then that traddate in the Trade table will have to removed.
It's written as two steps, but it's really only one pass of the data since ANN_WINDOW is a data seet VIEW, not a data set FILE:
/* Make a data set view of trade dates that should be excluded*/
data ann_window/view=ann_window;
set ann;
do trade_date=announce_date-1 to announce_date+1;
output;
end;
run;
data want;
merge trades ann_window (in=inw);
by cusip trade_date;
if inw=0;
run;
Hi,
For the second datastep in your first reply, merging does not work since the two datasets need to be sorted at first. Since ann_window is a view table it cannot be sorted. So what can be done?
Your annoncements data set is not likely to be large, so sort it before making the windows. I assume your trades dataset is already sorted by cusip/date.
MK
Neither the PRoC SQL nor the hash solution took advantage of the sorted order of your data sets, which is why I offered the solution using the data set view.
However, a more challenging (and common problem for financial researchers) is to elminate a three-trading-windows centered on the announcment date, rather than a three calendar-day solutions. this wold make the sql and hash a good deal more complicated, but the data-set-view technique holds up. The main change below is the second SET statement in the DATA WANT step. It has the FIRSTOBS=2 parement, used to look ahead one record, which in turn allows determination of whether the record-in-hand is one tradyiing day prior to an announcement.
data ta / view=ta;
merge Trades Announce (in=ina rename=(announce_date=trade_date));
by cusip trade_date;
in_a=ina;
run;
data want;
set ta;
by cusip;
/* read ahead one record */
if eof=0 then set ta (firstobs=2 keep=in_a rename=(in_a=next_in_a)) end=eof;
else next_in_a=0;
if in_a or
(lag(in_a) and not(first.cusip)) or
(next_in_a and not(last.cusip))
then delete;
run;
Following errors are shown for those codes:
ERROR 22-7: Invalid option name END.
ERROR 6-185: Missing ')' parenthesis for data set option list.
ERROR 79-322: Expecting a ).
Now it shows the following error:
ERROR: File WORK.VTEMP.DATA does not exist.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.