Solved: Re: Speed up Custom "trading day" Interval in INTCK function

jab · Posted 02-16-2018 12:39 PM

I created a custom interval for trading days using the following code:

* code to create trading day intervals;
proc sort data = dat.trading_day0;by caldt;run;
data trading_days (rename = (lag1dt=begin caldt=end)); set dat.trading_day0;
format lag1dt caldt date9.;
lag1dt = lag(caldt);run;

options intervalds=(TradingDay=work.trading_days);

Pretty straightforward. But when I want to use it to compute trading day intervals, it is *really* slow.

For example, I run the following:

data test1; set dat.neg_events1;
	trday_gap_lag = intck('TradingDay',TradingDate,lag1date);
run;
data test1; set test1;
	weekday_gap_lag = intck('weekday',TradingDate,lag1date);
run;

The first data step takes 25 seconds, the second one takes 0.02 seconds. This is just a subsample of my data. On the full dataset, the first step takes almost 3 minutes, the second takes 0.25 seconds. Yet they are (almost) identical. Clearly there is some optimization on the second one that I don't have.

Both necessary datasets attached (trading_day0 and neg_events1).

ChrisNZ · Posted 02-18-2018 10:08 PM

That's the way it is. The customer interval is not optimised (precompiled) so it runs slower.

I divided the run time by three by using

sasfile TRADING_DAYS  load;

If you must use custom intervals, make sure you have as few intervals as possible in that table.

High-Performance SAS Coding - Third Edition

View solution in original post

ChrisNZ · Posted 02-18-2018 10:08 PM

That's the way it is. The customer interval is not optimised (precompiled) so it runs slower.

I divided the run time by three by using

sasfile TRADING_DAYS  load;

If you must use custom intervals, make sure you have as few intervals as possible in that table.

High-Performance SAS Coding - Third Edition

jab · Posted 02-19-2018 11:34 AM

Ok. That is helpful in terms of understanding and somewhat with speed.

So, to your last point: Can I omit all 1 day entries then? I.e. I only need to include entries with weekends and the days that markets are closed? That would allow me to omit a lot of entries. The documentation wasn't clear to me: it said if there was no end date, it would just increment by 1, but I wasn't sure what the behavior would be if there were omissions.

Related: Is there a way to have my custom list "inherit" the 'weekday' specification? Then, I could just have a handful of entries per year for holidays when markets were closed.

Finally: is there a way to make this a feature request? I imagine I'm not the only one doing research on financial markets who would use this feature. Getting a 'trading day' interval precompiled would be super helpful because financial market data can be quite large and therefore slow.

mkeintz · Posted 02-19-2018 02:31 PM

Yes, a trading date calendar would be great. But which exchange(s)? NYSE? LSE? Bourse? etc. Or maybe you would want global trading dates covering multiple exchanges?

And how far back should the calendars go? All the normal sas calendar intervals can be relied on to accurately go back to the start of the Gregorian calendar (October 1582). But each exchange would have a unique "birthdate", so there would be no common starting date. Maybe such a function would only accept dates after 01jan1900. But then there are younger exchanges to be accommodated.

Also, and likely more problematic: Normal SAS calendar intervals can be relied on for accurate future dates and intervals, but we don't know the future calendars of stock markets. Not only do exchanges change their calendars by policy, they can also miss dates through unanticipated events. On 9/11/2001 NYSE was closed early, and was closed all day on Tuesday 9/12 through Friday 9/15.

Of course, you might restrict such functions to reject arguments with future dates. But that would not help when running such a function on 9/17/2001 (the NYSE reopen date). It would regard the prior trading date as 9/15. After all we don't update pre-compiled sas functions on a daily basis.

And as to speed: sas calendar functions using normal intervals ('day','week','weekday',etc.) are all algorithm based. While functions using holidays, (or trading days) have to be based on a date-list instead of an algorithm. I don't think we'll ever get functions using trading-day intervals to be as fast as regular calendar units.

Edited note: In so far as holiday assignments are rule-based and not idiosyncratic from year to year, you might be able to improve speed. But holidays - like trading days - can change over time.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 02-19-2018 01:57 PM

See if a hash lookup is any faster:

proc sort data = dat.trading_day0; by caldt; run;

data trading_day_index (rename=(caldt=_caldt));
  set dat.trading_day0;
  _index=_n_;
run;


data test1 (drop=_:);
  set dat.neg_events1;
  if _n_=1 then do;
    if 0 then set trading_day_index;
    declare hash h (dataset:'trading_day_index');
      h.definekey('_caldt');
      h.definedata('_index');
      h.definedone();
  end;

  do _d=tradingdate,lag1date;
    rc=h.find(key:_d);
    trday_gap_hash=dif(_index);
  end;
run;

Notes:

Make sure that dat.trading_day0 covers all the values of tradingdate and lag1date.
The "do _d=..." loop is used so that the DIF function it contains is a single dif QUEUE, even though two unique values (tradingdate, and lag1date) are submitted to that queue. In other words this would fail:
      rc=h.find(key:tradingdate);
      trday_gap_hash=dif(_index);
      rc=h.find(key:lag1date);
      trday_gap_hash=dif(_index);
because two independent DIF queues would be generated.
If the DIF queue explanation is not sufficiently clear, then you could:
    rc=h.find(key:tradingdate);
    _index_from=_index;
    rc=h.find(key:lag1date);
    trday_gap_hash=_index-_index_from;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Speed up Custom "trading day" Interval in INTCK function

Re: Speed up Custom "trading day" Interval in INTCK function

Re: Speed up Custom "trading day" Interval in INTCK function

Re: Speed up Custom "trading day" Interval in INTCK function

Re: Speed up Custom "trading day" Interval in INTCK function

Re: Speed up Custom "trading day" Interval in INTCK function

Registration is open

SAS Training: Just a Click Away