BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
somebody
Lapis Lazuli | Level 10

I am doing an event study. I have a dataset containing trading days and the main dataset. The following code adds 10 trading days before and 100 days after each day in the main dataset. However, it is very slow. Is there another way to achieve the same task or How can I improve efficiency?

data tradingdays; set wrds.tradingdays;run;
	options intervalds=(tradingdays=tradingdays);
	proc sort data=main_sample;by stock date;run;
	data main_sample(rename=(date=date_t0 date2=date)); 
		set main_sample;
		by stock date;
		if first.date;
		format date2 YYMMDD10.;
		do t=-10 to 100; date2=intnx('tradingdays',date,t)  ;output; end;
	run;

Edit: Added a sample of the datasets.
An example of my main data looks like this:

stockdate
AAA20190123
BBB20181110
CCC20100701

What I would like to achieve is something like this:

stockdate_t0datet
AAA2019012320190117-3
AAA2019012320190118-2
AAA2019012320190122-1
AAA20190123201901230
AAA20190123201901241
AAA20190123201901252
AAA20190123201901283
BBB2018110920181106-3
BBB2018110920181107-2
BBB2018110920181108-1
BBB20181109201811090
BBB20181109201811121
BBB20181109201811132
BBB20181109201811143
CCC2010070120100628-3
CCC2010070120100629-2
CCC2010070120100630-1
CCC20100701201007010
CCC20100701201007021
CCC20100701201007052
CCC20100701201007063
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I'm going to strike out this entire comment.  I see that you are treating every date in your sample as an event date.  So you will be producing a dataset 121 times the size of your original sample.  Is that your intention?

 

 

 

Let's say wrds.tradingdays has one obs per trading day, with variable name TRDATE, then this can produce what you describe:

 

 data want (drop=d d0 trdate);
   array trdates {2000} _temporary_;

   if _n_=1 then do d=1 by 1 until (end_of_trcalendar);
     set wrds.tradingdays (keep=trdate) end=end_of_trcalendar;
     trdates{d}=trdate;
   end;

   set main.sample;
   by stock date;

   if first.date;
   date_t0=date;
   d0=whichn(date_t0,of trdates{*});

   do T=-10 to 100;
     date=trdates{d0+t};
     output;
   end;
run;

The technique here is to make an array of trading dates (TRDATES above).  So if the event date matches the date in  the 20th element of the (i.e. event date =TRDATES{20}) then all you have to do is loop from TRDATES{10} through TRDATES{120}.

 

The only "slow" part of this is the WHICHN function which matches your event date serially through the array.  But you're only doing it once per event, so the cost may not be high.

 

But beware - in the above, the event date must be a trading date.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

14 REPLIES 14
LeonidBatkhan
Lapis Lazuli | Level 10

Hi ducman1611,

I don't think your code does what you describe it should do. To do what you describe, you need much simpler code, such as this:

options intervalds=(tradingdays=wrds.tradingdays);

data main_sample; 
   set main_sample;
   date_10_before = intnx('tradingdays',date,-10);
   date_100_after = intnx('tradingdays',date,100);
   format date_10_before date_100_after YYMMDD10.;
run;

Hope this helps.

 

somebody
Lapis Lazuli | Level 10

Hi, I think your code only create 2 extra variables with the before and after dates. What I aim to achieve is to create new observations before and after the dates from the main_sample dataset

SASKiwi
PROC Star

How slow is your program? It would help if you posted your SAS log including notes showing how long it took and how many rows. If you are creating 110 rows for every input row and if you have a large number of input rows then that is going to take time regardless.

somebody
Lapis Lazuli | Level 10

It has not finished running. I used the same code with "weekdays" and it finished after 20 seconds. When I replace it with my "tradingdays", it will probably take 20 hours

SASKiwi
PROC Star

OK, so using INTERVALDS is slow. Perhaps you could re-engineer your program to generate all possible trading days only once into an intermediate dataset, then use that to build your final dataset. You could do this by getting the minimum and maximum dates from your main sample.  

somebody
Lapis Lazuli | Level 10

That is what I did. the TRADINGDAYS dataset has 1 column that contains only trading days. My code works as follows: it goes to each obs in the MAIN dataset (call this event day) and create 10 trading days before and 100 trading days after by going through the TRADINGDAYS dataset, and this is why it is so slow. this process also gives me is a T variable that tells how many trading days away from my event day. I don't know how to improve it

SASKiwi
PROC Star

@somebody  - If I understand your program correctly you are repeating your trading day calculation for each of your stocks. If you calculate the minimum and maximum days across all of your stocks you only need to do the trading days calculation once.

 

If would help if you posted some sample data so we can understand your problem better.  

somebody
Lapis Lazuli | Level 10

I edited the post, adding an example

andreas_lds
Jade | Level 19

Please enable option fullstimer to display  more information in the log and then post the full log.

LeonidBatkhan
Lapis Lazuli | Level 10

Hi ducman1611,

I recently wrote a blog post that solves the problem such as yours. Take a look at Shifting a date by a given number of workdays. Workdays are your trading days.

I used slightly different technique there, using SAS user-defined formats dynamically built from the calendar table.

There is also a discussion at the end of that post regarding using custom time intervals (you need your calendar table prepared in a certain way - to have BEGIN variable and format assigned), there is also a code example there.

Please take a look.

Best,

Leonid

 

mkeintz
PROC Star

I'm going to strike out this entire comment.  I see that you are treating every date in your sample as an event date.  So you will be producing a dataset 121 times the size of your original sample.  Is that your intention?

 

 

 

Let's say wrds.tradingdays has one obs per trading day, with variable name TRDATE, then this can produce what you describe:

 

 data want (drop=d d0 trdate);
   array trdates {2000} _temporary_;

   if _n_=1 then do d=1 by 1 until (end_of_trcalendar);
     set wrds.tradingdays (keep=trdate) end=end_of_trcalendar;
     trdates{d}=trdate;
   end;

   set main.sample;
   by stock date;

   if first.date;
   date_t0=date;
   d0=whichn(date_t0,of trdates{*});

   do T=-10 to 100;
     date=trdates{d0+t};
     output;
   end;
run;

The technique here is to make an array of trading dates (TRDATES above).  So if the event date matches the date in  the 20th element of the (i.e. event date =TRDATES{20}) then all you have to do is loop from TRDATES{10} through TRDATES{120}.

 

The only "slow" part of this is the WHICHN function which matches your event date serially through the array.  But you're only doing it once per event, so the cost may not be high.

 

But beware - in the above, the event date must be a trading date.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
somebody
Lapis Lazuli | Level 10

I tried your code and get an error:

ERROR: Array subscript out of range at line 444 column 13.
d=2001 end_of_trcalendar=0 trdate=19320917 PERMNO=. DATE=. FIRST.PERMNO=1 LAST.PERMNO=1 FIRST.DATE=1
LAST.DATE=1 date_t0=. d0=. T=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2002 observations read from the data set WORK.TRADINGDAYS.
NOTE: There were 1 observations read from the data set WORK.TEST.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0
         observations and 4 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

how do I fix this? btw, why did you cross out your comment?

mkeintz
PROC Star

Why do you have   "if first.date" in your program? Does this mean you have a given permno/date combination repeated in main.sample? This is what gave rise to my striking out the program.

 

And notice the sas log reports date=.   That is the source of this particular problem, since the program assumes you have valid DATE values in main.sample. You'll have to make sure you only have valid date values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
somebody
Lapis Lazuli | Level 10

even i made sure the variable date has a valid value, the error is still returned. I notice that if I change the number in "array trdates {2000}" to {20000}, it solves the issue. Do you have any idea why this is the case?

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2442 views
  • 0 likes
  • 5 in conversation