DATA Step, Macro, Functions and more

Speed up Custom "trading day" Interval in INTCK function

Accepted Solution Solved
Reply
New Contributor jab
New Contributor
Posts: 4
Accepted Solution

Speed up Custom "trading day" Interval in INTCK function

I created a custom interval for trading days using the following code:

* code to create trading day intervals;
proc sort data = dat.trading_day0;by caldt;run;
data trading_days (rename = (lag1dt=begin caldt=end)); set dat.trading_day0;
format lag1dt caldt date9.;
lag1dt = lag(caldt);run;

options intervalds=(TradingDay=work.trading_days);

 

Pretty straightforward. But when I want to use it to compute trading day intervals, it is *really* slow. 

For example, I run the following:

data test1; set dat.neg_events1;
	trday_gap_lag = intck('TradingDay',TradingDate,lag1date);
run;
data test1; set test1;
	weekday_gap_lag = intck('weekday',TradingDate,lag1date);
run;

The first data step takes 25 seconds, the second one takes 0.02 seconds. This is just a subsample of my data. On the full dataset, the first step takes almost 3 minutes, the second takes 0.25 seconds. Yet they are (almost) identical. Clearly there is some optimization on the second one that I don't have. 

Both necessary datasets attached (trading_day0 and neg_events1).

 

Attachment
Attachment

Accepted Solutions
Solution
‎02-22-2018 09:32 AM
PROC Star
Posts: 2,230

Re: Speed up Custom "trading day" Interval in INTCK function

That's the way it is. The customer interval is not optimised (precompiled) so it runs slower.

I divided the run time by three by using

 

sasfile TRADING_DAYS  load;

If you must use custom intervals, make sure you have as few intervals as possible in that table.

 

 

 

View solution in original post


All Replies
Solution
‎02-22-2018 09:32 AM
PROC Star
Posts: 2,230

Re: Speed up Custom "trading day" Interval in INTCK function

That's the way it is. The customer interval is not optimised (precompiled) so it runs slower.

I divided the run time by three by using

 

sasfile TRADING_DAYS  load;

If you must use custom intervals, make sure you have as few intervals as possible in that table.

 

 

 

New Contributor jab
New Contributor
Posts: 4

Re: Speed up Custom "trading day" Interval in INTCK function

Ok. That is helpful in terms of understanding and somewhat with speed. 

 

So, to your last point: Can I omit all 1 day entries then? I.e. I only need to include entries with weekends and the days that markets are closed? That would allow me to omit a lot of entries. The documentation wasn't clear to me: it said if there was no end date, it would just increment by 1, but I wasn't sure what the behavior would be if there were omissions.

 

Related: Is there a way to have my custom list "inherit" the 'weekday' specification? Then, I could just have a handful of entries per year for holidays when markets were closed. 

 

Finally: is there a way to make this a feature request? I imagine I'm not the only one doing research on financial markets who would use this feature. Getting a 'trading day' interval precompiled would be super helpful because financial market data can be quite large and therefore slow. 

Trusted Advisor
Posts: 1,288

Re: Speed up Custom "trading day" Interval in INTCK function

[ Edited ]

Yes, a trading date calendar would be great.  But which exchange(s)?  NYSE? LSE?  Bourse? etc.   Or maybe you would want global trading dates covering multiple exchanges?

 

And how far back should the calendars go?  All the normal sas calendar intervals can be relied on to accurately go back to the start of the Gregorian calendar (October 1582).  But each exchange would have a unique "birthdate", so there would be no common starting date.  Maybe such a function would only accept dates after 01jan1900.  But then there are younger exchanges to be accommodated.

 

Also, and likely more problematic: Normal SAS calendar intervals can be relied on for accurate future dates and intervals, but we don't know the future calendars of stock markets.  Not only do exchanges change their calendars by policy, they can also miss dates through unanticipated events.  On 9/11/2001 NYSE was closed early, and was closed all day on Tuesday 9/12 through Friday 9/15.

 

Of course, you might restrict such functions to reject arguments with future dates.  But that would not help when running such a function on 9/17/2001 (the NYSE reopen date).  It would regard the prior trading date as 9/15.  After all we don't update pre-compiled sas functions on a daily basis.

 

And as to speed: sas calendar functions using normal intervals ('day','week','weekday',etc.) are all algorithm based.  While functions using holidays, (or trading days) have to be based on a date-list instead of an algorithm.  I don't think we'll ever get functions using  trading-day intervals to be as fast as regular calendar units.

 

Edited note: In so far as holiday assignments are rule-based and not idiosyncratic from year to year, you might be able to improve speed.  But holidays - like trading days - can change over time.

Trusted Advisor
Posts: 1,288

Re: Speed up Custom "trading day" Interval in INTCK function

See if a hash lookup is any faster:

 

proc sort data = dat.trading_day0; by caldt; run;

data trading_day_index (rename=(caldt=_caldt));
  set dat.trading_day0;
  _index=_n_;
run;


data test1 (drop=_:);
  set dat.neg_events1;
  if _n_=1 then do;
    if 0 then set trading_day_index;
    declare hash h (dataset:'trading_day_index');
      h.definekey('_caldt');
      h.definedata('_index');
      h.definedone();
  end;

  do _d=tradingdate,lag1date;
    rc=h.find(key:_d);
    trday_gap_hash=dif(_index);
  end;
run;

 

 

Notes:

  1. Make sure that dat.trading_day0 covers all the values of tradingdate and lag1date.
  2. The "do _d=..." loop is used so that the DIF function it contains is a single dif QUEUE, even though two unique values (tradingdate, and lag1date) are submitted to that queue.  In other words this would fail:
          rc=h.find(key:tradingdate);
          trday_gap_hash=dif(_index);
          rc=h.find(key:lag1date);
          trday_gap_hash=dif(_index);
    because two independent DIF queues would be generated.
  3. If the DIF queue explanation is not sufficiently clear, then you could:
        rc=h.find(key:tradingdate);
        _index_from=_index;
        rc=h.find(key:lag1date);
        trday_gap_hash=_index-_index_from;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 170 views
  • 4 likes
  • 3 in conversation