DATA Step, Macro, Functions and more

Converting irregular time-series observations to periodical observations?

Reply
Occasional Contributor
Posts: 7

Converting irregular time-series observations to periodical observations?

Hi, 

I'm writing a thesis using tick-by-tick stock exchange data with "time irregular" trade price observations.

I want to "fix" the observation to periodical observations relative to the trade day open time (denoted as OpenFlag).

I used Proc SQL to do this conversion (see attached below / attached file), but it is very inefficient (my dataset is big with ~500,000 observations each day * 3,000 days * 30 variables).

 

Can this be done more efficiently using Proc Expand or Proc Timeseries? I already burned one hard disk trying .... 

 

I appriciate any help I can get.

 

Thanks,

 

Saggi

 

Here is the code I use:

/* Create example dataset */
Data Trades;
input TimeStamp time9. X OpenFlag;
Format TimeStamp time9.;
datalines;
09:00:00 61 0
09:00:01 98 0
09:00:02 54 0
09:00:03 93 0
09:00:04 83 0
09:00:10 42 0
09:00:11 62 1
09:00:12 46 0
09:00:13 60 0
09:00:14 92 0
09:00:15 54 0
09:00:16 29 0
;
run;

 

/*Create Fix 2 seconds time interval table */
Data FT;
set Trades;
if OpenFlag = 1;
rename TimeStamp=OpenTime;
run;
Data FT;
set FT;
Retain OpenTime FTime;
Do i= -60 to 60 by 2;
FTime=OpenTime+i;
output;
end;
Format FTime time9.;
Keep OpenTime FTime;
run;

 

/* Merge and create a fix time trades dataset */
PROC SQL;
CREATE TABLE WORK.FT_2Sec_Trades AS
SELECT t1.FTime,
t2.TimeStamp,
t2.OpenFlag,
t2.X
FROM FT t1
LEFT JOIN Trades t2 ON (t1.FTime >= t2.TimeStamp)
Group by t1.FTime having t2.TimeStamp = max(t2.TimeStamp)
ORDER BY t1.FTime,
t2.TimeStamp;
QUIT;

 

Trusted Advisor
Posts: 1,309

Re: Converting irregular time-series observations to periodical observations?

[ Edited ]
Posted in reply to SK12376574

For dealing with sorted large datasets, and producing datasets compare the sort key in one record to neighboring records, I  have never seen SQL come anywhere close to a well-formed data step:

 

Data Trades;
input TimeStamp time9. X OpenFlag;
Format TimeStamp time9.;
datalines;
09:00:00 61 0
09:00:01 98 0
09:00:02 54 0
09:00:03 93 0
09:00:04 83 0
09:00:10 42 0
09:00:11 62 1
09:00:12 46 0
09:00:13 60 0
09:00:14 92 0
09:00:15 54 0
09:00:16 29 0
;
run;

data want;
  set trades (where=(openflag=1))
      trades (where=(openflag=0));
  /*by ticker;*/

  if openflag=1 then opentime=timestamp;
  retain opentime;
  format opentime time8.0;

  if abs(timestamp-opentime)<=60;
  fixed_time= round(1+timestamp-opentime,2);
  format fixed_time time8.0;
run;

 

This program assumes the data are sorted by timestamp. If the data are actually sorted by  ticker/time, put  in a "by ticker;" statement.

 

 

 Edits:

If your data is sorted by  ticker/timestamp, then  use the "by ticker;" statement.

 

Also this program assumes 1 openflag=1 record per ticker.

Occasional Contributor
Posts: 7

Re: Converting irregular time-series observations to periodical observations?

Thanks for the prompt reply.

Since I need the code to extrapolate the last known value, I changed the code as following.

As you can see there are two "missing" observations 09:00:07 and 09:00:09 both of them should get the last known value of x=83.

I will appreciate any help in filling out the missing observation in  an efficient way (I used proc expand but it is sort of a black box for me).

Thanks, 

SK

 

Data Trades;
input TimeStamp time9. X OpenFlag;
Format TimeStamp time9.;
datalines;
09:00:00 61 0
09:00:01 98 0
09:00:02 54 0
09:00:03 93 0
09:00:04 83 0
09:00:10 42 0
09:00:11 62 1
09:00:12 46 0
09:00:13 60 0
09:00:14 92 0
09:00:15 54 0
09:00:16 29 0
;
run;
data want;
set trades (where=(openflag=1))
trades (where=(openflag=0));
if openflag=1 then opentime=timestamp;
retain opentime;
format opentime time8.0;
fixed_time=opentime+ceil((timestamp-opentime)/2)*2;
format fixed_time time8.0;
run;
Proc Sort data=want;
by fixed_time timestamp;
run;
Data Want;
set want;
by fixed_time timestamp;
if last.fixed_time;
run;

Trusted Advisor
Posts: 1,309

Re: Converting irregular time-series observations to periodical observations?

Posted in reply to SK12376574

x

 

  1. Why are you doing a PROC SORT, by fixed_time/timestamp?
    Fixed_time is a monotonic step function of timestamp.  So if the incoming data is sorted by timestamp, the result is already sorted by fixed_time/timestamp.  It would be a waste of disk input/output activity and time.

  2. If all you want is the last valid value at a given fixed_time, you can do that selection in the first data step.  You don't really need an extra DATA step - again a waste of disk activity.

    But if you prefer that technique, you can eliminate the disk activity by making the first data set a dataset view, and then read it to make the second data set as a dataset file, as in:
         data need / view=need;
           set trades .....;
           .....
         run;
         data want;
           set need;
           by fixed_time;
           if last.fixed_time;
         run;

    A data set view doesn't write data to disk,  it processes the data only when referenced in a later step.  Then instead of writing to disk, it streams the data to the calling step. 

 

As to holes in your timestamps, you can change the DATA WANT step to

 

data want (drop=nxt_Smiley Happy;
   merge need

         need (firstobs=2 keep=fixed_time rename=(fixed_time=nxt_fixed_time))  end=end_of_merge;

   if end_of_merge=0 then do fixed_time=fixed_time to nxt_fixed_time-2 by 2;

     output;

   end;

run;

run;

Occasional Contributor
Posts: 7

Re: Converting irregular time-series observations to periodical observations?

Thanks again for the help.

As for your remarks:

  1. I'm using the proc sort because the first data step used to merge the open time places the opentime observation as the first row whereas according to its timestamp (09:00:11) it should be in the middle.
  2. In your reply, you mentioned "you can do that selection in the first data step" - how?
    I didn't succeed doing that (I can do it using proc sql).
  3. The code for "filling the holes" got different results than I expected, I would greatly appreciate it if you can elaborate. 

Here are the results I'm looking for (the red rows are the "holes" I filled manually):

TimeStampXOpenFlagopentimefixed_time
09:00:0198009:00:1109:00:01
09:00:0393009:00:1109:00:03
09:00:0483009:00:1109:00:05
09:00:0483009:00:1109:00:07
09:00:0483009:00:1109:00:09
09:00:1162109:00:1109:00:11
09:00:1360009:00:1109:00:13
09:00:1554009:00:1109:00:15
09:00:1629009:00:1109:00:17

 

Thanks again, 

SK

Trusted Advisor
Posts: 1,309

Re: Converting irregular time-series observations to periodical observations?

Posted in reply to SK12376574

 


SK12376574 wrote:

Thanks again for the help.

As for your remarks:

  1. I'm using the proc sort because the first data step used to merge the open time places the opentime observation as the first row whereas according to its timestamp (09:00:11) it should be in the middle.
  2. In your reply, you mentioned "you can do that selection in the first data step" - how?
    I didn't succeed doing that (I can do it using proc sql).
  3. The code for "filling the holes" got different results than I expected, I would greatly appreciate it if you can elaborate. 

Here are the results I'm looking for (the red rows are the "holes" I filled manually):

TimeStamp X OpenFlag opentime fixed_time
09:00:01 98 0 09:00:11 09:00:01
09:00:03 93 0 09:00:11 09:00:03
09:00:04 83 0 09:00:11 09:00:05
09:00:04 83 0 09:00:11 09:00:07
09:00:04 83 0 09:00:11 09:00:09
09:00:11 62 1 09:00:11 09:00:11
09:00:13 60 0 09:00:11 09:00:13
09:00:15 54 0 09:00:11 09:00:15
09:00:16 29 0 09:00:11 09:00:17

 

Thanks again, 

SK


 

As to your first comment: if you want  a copy of the openflag record in it's proper order, then drop  the second "where=" parameter.  I.e. change

 

data want;
  set trades (where=(openflag=1))
      trades (where=(openflag=0));
  /*by ticker;*/

 

to

 

data want;
  set trades (where=(openflag=1))
      trades ;
  /*by ticker;*/

 

This allows the openflag=1 record to be read twice: once at the beginning (to extablish opentime) and a second time in chronological order, to provide other data for that trade.

 

 

 

More generally, this is a one-step program:

 

data Trades;
  input TimeStamp time9. X OpenFlag;
  Format TimeStamp time9.;
datalines;
09:00:00 61 0
09:00:01 98 0
09:00:02 54 0
09:00:03 93 0
09:00:04 83 0
09:00:10 42 0
09:00:11 62 1
09:00:12 46 0
09:00:13 60 0
09:00:14 92 0
09:00:15 54 0
09:00:16 29 0
;
run;

data want (drop=nxt_:  _:);
  /* Read the openflag trade, followed by all trades (include 2nd read of openflag trade*/
  set trades (where=(openflag=1) in=optrade)
      trades (in=alltrades)  
      end=end_of_tr;
  /*by ticker; */

  if optrade then do;  /* if optrade set opentime and delete this record ... it will be read again*/
    opentime=timestamp;
    fixed_opentime=round(1+opentime,2);
    _open_minus_60=fixed_opentime-60;    /*Minimum acceptable fixed_time*/
    _open_plus_60=fixed_opentime+60;     /*Maximum acceptable fixed_time*/
    delete;
  end;
  retain opentime fixed_opentime _open_minus_60 _open_plus_60;
  format opentime fixed_opentime _open_minus_60 _open_plus_60 time8.0;

  /* Read ahead one record to get next fixed_time*/
  if end_of_tr=0 then set trades (firstobs=2 keep=timestamp rename=(timestamp=nxt_timestamp));
  else nxt_timestamp=_open_plus_60+2;

  fixed_time=round(1+timestamp,2);
  nxt_fixed_time=round(1+nxt_timestamp,2);
  format fixed_time nxt_fixed_time time8.0;

  if nxt_fixed_time=fixed_time then delete;     /*keep only final rec for each fixed_time*/
  if nxt_fixed_time<_open_minus_60 then delete; /*drop records that are too early*/

  if fixed_time>_open_plus_60 then stop; /*replace "stop" by "delete" if sorted by ticker/timestamp */

  do fixed_time=max(fixed_time,_open_minus_60) to min(nxt_fixed_time-2,_open_plus_60) by 2;
    output;
  end;
run;

 

If the data set has multiple tickers, (i.e. sorted by ticker/timestamp instead of just timestamp), then you should do 2 things:

  1.  Uncomment the /*by ticker;*/ statement
  2. Change "stop" to  "delete" in the "if fixed_time>_open_plus_60 then stop;" statement.
Occasional Contributor
Posts: 7

Re: Converting irregular time-series observations to periodical observations?

I apologize for the delayed response - I fell ill.

Thanks for the complete solution, I highly appreciate it.

SK

Occasional Contributor
Posts: 7

Re: Converting irregular time-series observations to periodical observations?

Posted in reply to SK12376574

Sorry for the hassle, but I just notice the solution gives the wrong fixed_time (every 2 seconds relative to 09:12 instead of 09:11).

I changed the code a bit by subtracting 1 from the fixed_time and fixed_opentime variables (I haven't yet fully understand how the code works), so it seems to give the required result:

Thanks again, 

SK

 

data want (drop=nxt_: _: fixed_opentime);
/* Read the openflag trade, followed by all trades (include 2nd read of openflag trade*/
set trades (where=(openflag=1) in=optrade)
trades (in=alltrades)
end=end_of_tr;
/*by ticker; */

if optrade then do; /* if optrade set opentime and delete this record ... it will be read again*/
opentime=timestamp;
fixed_opentime=round(1+opentime,2)-1;
_open_minus_60=fixed_opentime-60; /*Minimum acceptable fixed_time*/
_open_plus_60=fixed_opentime+60; /*Maximum acceptable fixed_time*/
delete;
end;
retain opentime fixed_opentime _open_minus_60 _open_plus_60;
format opentime fixed_opentime _open_minus_60 _open_plus_60 time8.0;

/* Read ahead one record to get next fixed_time*/
if end_of_tr=0 then set trades (firstobs=2 keep=timestamp rename=(timestamp=nxt_timestamp));
else nxt_timestamp=_open_plus_60+2;

fixed_time=round(1+timestamp,2)-1;
nxt_fixed_time=round(1+nxt_timestamp,2);
format fixed_time nxt_fixed_time time8.0;

if nxt_fixed_time=fixed_time then delete; /*keep only final rec for each fixed_time*/
if nxt_fixed_time<_open_minus_60 then delete; /*drop records that are too early*/

if fixed_time>_open_plus_60 then stop; /*replace "stop" by "delete" if sorted by ticker/timestamp */

do fixed_time=max(fixed_time,_open_minus_60) to min(nxt_fixed_time-2,_open_plus_60) by 2;
output;
end;
run;

 

Ask a Question
Discussion stats
  • 7 replies
  • 236 views
  • 0 likes
  • 2 in conversation