I am doing an event study. I have a dataset containing trading days and the main dataset. The following code adds 10 trading days before and 100 days after each day in the main dataset. However, it is very slow. Is there another way to achieve the same task or How can I improve efficiency?
data tradingdays; set wrds.tradingdays;run;
options intervalds=(tradingdays=tradingdays);
proc sort data=main_sample;by stock date;run;
data main_sample(rename=(date=date_t0 date2=date));
set main_sample;
by stock date;
if first.date;
format date2 YYMMDD10.;
do t=-10 to 100; date2=intnx('tradingdays',date,t) ;output; end;
run;
Edit: Added a sample of the datasets.
An example of my main data looks like this:
stock | date |
AAA | 20190123 |
BBB | 20181110 |
CCC | 20100701 |
What I would like to achieve is something like this:
stock | date_t0 | date | t |
AAA | 20190123 | 20190117 | -3 |
AAA | 20190123 | 20190118 | -2 |
AAA | 20190123 | 20190122 | -1 |
AAA | 20190123 | 20190123 | 0 |
AAA | 20190123 | 20190124 | 1 |
AAA | 20190123 | 20190125 | 2 |
AAA | 20190123 | 20190128 | 3 |
BBB | 20181109 | 20181106 | -3 |
BBB | 20181109 | 20181107 | -2 |
BBB | 20181109 | 20181108 | -1 |
BBB | 20181109 | 20181109 | 0 |
BBB | 20181109 | 20181112 | 1 |
BBB | 20181109 | 20181113 | 2 |
BBB | 20181109 | 20181114 | 3 |
CCC | 20100701 | 20100628 | -3 |
CCC | 20100701 | 20100629 | -2 |
CCC | 20100701 | 20100630 | -1 |
CCC | 20100701 | 20100701 | 0 |
CCC | 20100701 | 20100702 | 1 |
CCC | 20100701 | 20100705 | 2 |
CCC | 20100701 | 20100706 | 3 |
I'm going to strike out this entire comment. I see that you are treating every date in your sample as an event date. So you will be producing a dataset 121 times the size of your original sample. Is that your intention?
Let's say wrds.tradingdays has one obs per trading day, with variable name TRDATE, then this can produce what you describe:
data want (drop=d d0 trdate); array trdates {2000} _temporary_; if _n_=1 then do d=1 by 1 until (end_of_trcalendar); set wrds.tradingdays (keep=trdate) end=end_of_trcalendar; trdates{d}=trdate; end; set main.sample; by stock date; if first.date; date_t0=date; d0=whichn(date_t0,of trdates{*}); do T=-10 to 100; date=trdates{d0+t}; output; end; run;
The technique here is to make an array of trading dates (TRDATES above). So if the event date matches the date in the 20th element of the (i.e. event date =TRDATES{20}) then all you have to do is loop from TRDATES{10} through TRDATES{120}.
The only "slow" part of this is the WHICHN function which matches your event date serially through the array. But you're only doing it once per event, so the cost may not be high.
But beware - in the above, the event date must be a trading date.
Hi ducman1611,
I don't think your code does what you describe it should do. To do what you describe, you need much simpler code, such as this:
options intervalds=(tradingdays=wrds.tradingdays);
data main_sample;
set main_sample;
date_10_before = intnx('tradingdays',date,-10);
date_100_after = intnx('tradingdays',date,100);
format date_10_before date_100_after YYMMDD10.;
run;
Hope this helps.
Hi, I think your code only create 2 extra variables with the before and after dates. What I aim to achieve is to create new observations before and after the dates from the main_sample dataset
How slow is your program? It would help if you posted your SAS log including notes showing how long it took and how many rows. If you are creating 110 rows for every input row and if you have a large number of input rows then that is going to take time regardless.
It has not finished running. I used the same code with "weekdays" and it finished after 20 seconds. When I replace it with my "tradingdays", it will probably take 20 hours
OK, so using INTERVALDS is slow. Perhaps you could re-engineer your program to generate all possible trading days only once into an intermediate dataset, then use that to build your final dataset. You could do this by getting the minimum and maximum dates from your main sample.
That is what I did. the TRADINGDAYS dataset has 1 column that contains only trading days. My code works as follows: it goes to each obs in the MAIN dataset (call this event day) and create 10 trading days before and 100 trading days after by going through the TRADINGDAYS dataset, and this is why it is so slow. this process also gives me is a T variable that tells how many trading days away from my event day. I don't know how to improve it
@somebody - If I understand your program correctly you are repeating your trading day calculation for each of your stocks. If you calculate the minimum and maximum days across all of your stocks you only need to do the trading days calculation once.
If would help if you posted some sample data so we can understand your problem better.
I edited the post, adding an example
Please enable option fullstimer to display more information in the log and then post the full log.
Hi ducman1611,
I recently wrote a blog post that solves the problem such as yours. Take a look at Shifting a date by a given number of workdays. Workdays are your trading days.
I used slightly different technique there, using SAS user-defined formats dynamically built from the calendar table.
There is also a discussion at the end of that post regarding using custom time intervals (you need your calendar table prepared in a certain way - to have BEGIN variable and format assigned), there is also a code example there.
Please take a look.
Best,
Leonid
I'm going to strike out this entire comment. I see that you are treating every date in your sample as an event date. So you will be producing a dataset 121 times the size of your original sample. Is that your intention?
Let's say wrds.tradingdays has one obs per trading day, with variable name TRDATE, then this can produce what you describe:
data want (drop=d d0 trdate); array trdates {2000} _temporary_; if _n_=1 then do d=1 by 1 until (end_of_trcalendar); set wrds.tradingdays (keep=trdate) end=end_of_trcalendar; trdates{d}=trdate; end; set main.sample; by stock date; if first.date; date_t0=date; d0=whichn(date_t0,of trdates{*}); do T=-10 to 100; date=trdates{d0+t}; output; end; run;
The technique here is to make an array of trading dates (TRDATES above). So if the event date matches the date in the 20th element of the (i.e. event date =TRDATES{20}) then all you have to do is loop from TRDATES{10} through TRDATES{120}.
The only "slow" part of this is the WHICHN function which matches your event date serially through the array. But you're only doing it once per event, so the cost may not be high.
But beware - in the above, the event date must be a trading date.
I tried your code and get an error:
ERROR: Array subscript out of range at line 444 column 13.
d=2001 end_of_trcalendar=0 trdate=19320917 PERMNO=. DATE=. FIRST.PERMNO=1 LAST.PERMNO=1 FIRST.DATE=1
LAST.DATE=1 date_t0=. d0=. T=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2002 observations read from the data set WORK.TRADINGDAYS.
NOTE: There were 1 observations read from the data set WORK.TEST.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0
observations and 4 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
how do I fix this? btw, why did you cross out your comment?
Why do you have "if first.date" in your program? Does this mean you have a given permno/date combination repeated in main.sample? This is what gave rise to my striking out the program.
And notice the sas log reports date=. That is the source of this particular problem, since the program assumes you have valid DATE values in main.sample. You'll have to make sure you only have valid date values.
even i made sure the variable date has a valid value, the error is still returned. I notice that if I change the number in "array trdates {2000}" to {20000}, it solves the issue. Do you have any idea why this is the case?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.