BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jab
Fluorite | Level 6 jab
Fluorite | Level 6

I created a custom interval for trading days using the following code:

* code to create trading day intervals;
proc sort data = dat.trading_day0;by caldt;run;
data trading_days (rename = (lag1dt=begin caldt=end)); set dat.trading_day0;
format lag1dt caldt date9.;
lag1dt = lag(caldt);run;

options intervalds=(TradingDay=work.trading_days);

 

Pretty straightforward. But when I want to use it to compute trading day intervals, it is *really* slow. 

For example, I run the following:

data test1; set dat.neg_events1;
	trday_gap_lag = intck('TradingDay',TradingDate,lag1date);
run;
data test1; set test1;
	weekday_gap_lag = intck('weekday',TradingDate,lag1date);
run;

The first data step takes 25 seconds, the second one takes 0.02 seconds. This is just a subsample of my data. On the full dataset, the first step takes almost 3 minutes, the second takes 0.25 seconds. Yet they are (almost) identical. Clearly there is some optimization on the second one that I don't have. 

Both necessary datasets attached (trading_day0 and neg_events1).

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

That's the way it is. The customer interval is not optimised (precompiled) so it runs slower.

I divided the run time by three by using

 

sasfile TRADING_DAYS  load;

If you must use custom intervals, make sure you have as few intervals as possible in that table.

 

 

 

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

That's the way it is. The customer interval is not optimised (precompiled) so it runs slower.

I divided the run time by three by using

 

sasfile TRADING_DAYS  load;

If you must use custom intervals, make sure you have as few intervals as possible in that table.

 

 

 

jab
Fluorite | Level 6 jab
Fluorite | Level 6

Ok. That is helpful in terms of understanding and somewhat with speed. 

 

So, to your last point: Can I omit all 1 day entries then? I.e. I only need to include entries with weekends and the days that markets are closed? That would allow me to omit a lot of entries. The documentation wasn't clear to me: it said if there was no end date, it would just increment by 1, but I wasn't sure what the behavior would be if there were omissions.

 

Related: Is there a way to have my custom list "inherit" the 'weekday' specification? Then, I could just have a handful of entries per year for holidays when markets were closed. 

 

Finally: is there a way to make this a feature request? I imagine I'm not the only one doing research on financial markets who would use this feature. Getting a 'trading day' interval precompiled would be super helpful because financial market data can be quite large and therefore slow. 

mkeintz
PROC Star

Yes, a trading date calendar would be great.  But which exchange(s)?  NYSE? LSE?  Bourse? etc.   Or maybe you would want global trading dates covering multiple exchanges?

 

And how far back should the calendars go?  All the normal sas calendar intervals can be relied on to accurately go back to the start of the Gregorian calendar (October 1582).  But each exchange would have a unique "birthdate", so there would be no common starting date.  Maybe such a function would only accept dates after 01jan1900.  But then there are younger exchanges to be accommodated.

 

Also, and likely more problematic: Normal SAS calendar intervals can be relied on for accurate future dates and intervals, but we don't know the future calendars of stock markets.  Not only do exchanges change their calendars by policy, they can also miss dates through unanticipated events.  On 9/11/2001 NYSE was closed early, and was closed all day on Tuesday 9/12 through Friday 9/15.

 

Of course, you might restrict such functions to reject arguments with future dates.  But that would not help when running such a function on 9/17/2001 (the NYSE reopen date).  It would regard the prior trading date as 9/15.  After all we don't update pre-compiled sas functions on a daily basis.

 

And as to speed: sas calendar functions using normal intervals ('day','week','weekday',etc.) are all algorithm based.  While functions using holidays, (or trading days) have to be based on a date-list instead of an algorithm.  I don't think we'll ever get functions using  trading-day intervals to be as fast as regular calendar units.

 

Edited note: In so far as holiday assignments are rule-based and not idiosyncratic from year to year, you might be able to improve speed.  But holidays - like trading days - can change over time.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

See if a hash lookup is any faster:

 

proc sort data = dat.trading_day0; by caldt; run;

data trading_day_index (rename=(caldt=_caldt));
  set dat.trading_day0;
  _index=_n_;
run;


data test1 (drop=_:);
  set dat.neg_events1;
  if _n_=1 then do;
    if 0 then set trading_day_index;
    declare hash h (dataset:'trading_day_index');
      h.definekey('_caldt');
      h.definedata('_index');
      h.definedone();
  end;

  do _d=tradingdate,lag1date;
    rc=h.find(key:_d);
    trday_gap_hash=dif(_index);
  end;
run;

 

 

Notes:

  1. Make sure that dat.trading_day0 covers all the values of tradingdate and lag1date.
  2. The "do _d=..." loop is used so that the DIF function it contains is a single dif QUEUE, even though two unique values (tradingdate, and lag1date) are submitted to that queue.  In other words this would fail:
          rc=h.find(key:tradingdate);
          trday_gap_hash=dif(_index);
          rc=h.find(key:lag1date);
          trday_gap_hash=dif(_index);
    because two independent DIF queues would be generated.
  3. If the DIF queue explanation is not sufficiently clear, then you could:
        rc=h.find(key:tradingdate);
        _index_from=_index;
        rc=h.find(key:lag1date);
        trday_gap_hash=_index-_index_from;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1670 views
  • 4 likes
  • 3 in conversation