Solved: Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

rkdubey84 · Posted 03-03-2017 02:15 AM

Dear All, Good Afternoon!!

I want to get the lead values based on DTM1 (Time), which is a Unique Key in my data. I want the BBP of next 5th minute and Next 1st minute.

I have a Dataset in the following format:

DTM1 BBP
03JUN2013:09:15:00 164.6
03JUN2013:09:16:00 167
03JUN2013:09:17:00 166
03JUN2013:09:18:00 164.15
03JUN2013:09:19:00 176.1
03JUN2013:09:20:00 169.3
03JUN2013:09:21:00 165
03JUN2013:09:22:00 180.7
03JUN2013:09:23:00 170
03JUN2013:09:24:00 180.3
03JUN2013:09:25:00 176
03JUN2013:09:26:00 175.5
03JUN2013:09:27:00 165.1
03JUN2013:09:28:00 164.3
03JUN2013:09:29:00 170

I want the dataset after processing to be like:

DTM1 BBP BBP1 BBP5
03JUN2013:09:15:00 164.6 167 169.3
03JUN2013:09:16:00 167 166 165
03JUN2013:09:17:00 166 164.15   180.7
03JUN2013:09:18:00 164.15 176.1   170
03JUN2013:09:19:00 176.1 169.3   180.3
03JUN2013:09:20:00 169.3 165 176
03JUN2013:09:21:00 165 180.7   175.5
03JUN2013:09:22:00 180.7 170 165.1
03JUN2013:09:23:00 170 180.3   164.3
03JUN2013:09:24:00 180.3 176 170
03JUN2013:09:25:00 176 175.5 .

03JUN2013:09:26:00 175.5 165.1 .
03JUN2013:09:27:00 165.1 164.3 .
03JUN2013:09:28:00 164.3 170 .
03JUN2013:09:29:00 170 . .

Please Help. Thanks in Advance.

Ritesh

mkeintz · Posted 03-03-2017 02:55 AM

It appears that there are no "holes" in your series. I.e. every minute is recorded, so a five-minute lead is always a 5-record lead.

This makes it simple - a 3-statement program as below. You just have to use the MERGE statement inside a DATA step. This program merges HAVE with two copies of itself. But one copy is offset by 1 observation (that's the FIRSTOBS=2 parameter below) and the other copy is offset by 5 obs (firstobs=6). Note the default is firstobs=1. So as a result you are reading in parallel

    obs1,   obs2, obs6
then
    obs2, obs3, obs7
    obs3, obs4, obs8

etc.

The only problem is that each copy of HAVE has the same variables (bbp in your case), but you can't have 3 values simultaneously occupying one var. So the second and third copies of have use the "keep=bbp" parameter to just read in just one var, and also a "rename=" parameter to make sure each merged copy of dataset HAVE has unique var names. If you have several variables (say you also have bbsize, bop, bosize), then make sure to keep them all and rename them all.

data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
data want;
  merge have
        have (firstobs=2 keep=bbp rename=(bbp=bbp1))
        have (firstobs=6 keep=bbp rename=(bbp=bbp5));
run;

BTW, what's bbp? "best bid price"?

Now you probably have these time series for several stock symbols. Let's say your data is sorted by symbol/dtm1, not just dtm1 as above Then you need a way to prevent the last few records of one symbol from getting "leads" from the first few records of the second symbol. The logic below is to get lead values of symbol as well as bbp. Then if the lead value of symbol does not match current symbol, then set the corresponding bbp to missing:

data want (drop=_:);
  merge have
        have (firstobs=2 keep=symbol bbp rename=(symbol=_symbol1 bbp=bbp1))
        have (firstobs=6 keep=symbol bbp rename=(symbol=_symbol5 bbp=bbp5));
  if _symbol5^=symbol then do;
    bbp5=.;
    if _symbol1^=symbol then bbp1=.;
  end;
run;

Note the "drop=_:" parameter tells sas not to output any vars whose name begins with the underscore.

Also note nesting of the

"if _symbol1^=symbol then bbp1=.;"
statement inside a do group is executed only "if _symbol5^=symbol then do;" After all, there's no need to compare _SYMBOL1 against SYMBOL if _SYMBOL5 matches SYMBOL.

Caveat: Some sas users might initially think to use MERGE with a BY statement, as in
merge have have (firstobs=2 keep=symbol bbp rename=(bbp=bbp1)) have (firstobs=6 keep=symbol bbp rename=(bbp=bbp5));

by symbol;

The problem is that the one-record and 5-record offsets would be preserved for the first symbol, but the moment the second symbol is started the they will no longer be offset as all. Thus you need the slightly tedious coding above to check SYMBOL against _SYMBOL1 and _SYMBOL5, etc.

If you have irregular time series (i.e. holes in the time points), you'll have to give up on the merge statement, and replace it with a collection of SET statements.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 03-03-2017 02:55 AM

It appears that there are no "holes" in your series. I.e. every minute is recorded, so a five-minute lead is always a 5-record lead.

This makes it simple - a 3-statement program as below. You just have to use the MERGE statement inside a DATA step. This program merges HAVE with two copies of itself. But one copy is offset by 1 observation (that's the FIRSTOBS=2 parameter below) and the other copy is offset by 5 obs (firstobs=6). Note the default is firstobs=1. So as a result you are reading in parallel

    obs1,   obs2, obs6
then
    obs2, obs3, obs7
    obs3, obs4, obs8

etc.

The only problem is that each copy of HAVE has the same variables (bbp in your case), but you can't have 3 values simultaneously occupying one var. So the second and third copies of have use the "keep=bbp" parameter to just read in just one var, and also a "rename=" parameter to make sure each merged copy of dataset HAVE has unique var names. If you have several variables (say you also have bbsize, bop, bosize), then make sure to keep them all and rename them all.

data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
data want;
  merge have
        have (firstobs=2 keep=bbp rename=(bbp=bbp1))
        have (firstobs=6 keep=bbp rename=(bbp=bbp5));
run;

BTW, what's bbp? "best bid price"?

Now you probably have these time series for several stock symbols. Let's say your data is sorted by symbol/dtm1, not just dtm1 as above Then you need a way to prevent the last few records of one symbol from getting "leads" from the first few records of the second symbol. The logic below is to get lead values of symbol as well as bbp. Then if the lead value of symbol does not match current symbol, then set the corresponding bbp to missing:

data want (drop=_:);
  merge have
        have (firstobs=2 keep=symbol bbp rename=(symbol=_symbol1 bbp=bbp1))
        have (firstobs=6 keep=symbol bbp rename=(symbol=_symbol5 bbp=bbp5));
  if _symbol5^=symbol then do;
    bbp5=.;
    if _symbol1^=symbol then bbp1=.;
  end;
run;

Note the "drop=_:" parameter tells sas not to output any vars whose name begins with the underscore.

Also note nesting of the

"if _symbol1^=symbol then bbp1=.;"
statement inside a do group is executed only "if _symbol5^=symbol then do;" After all, there's no need to compare _SYMBOL1 against SYMBOL if _SYMBOL5 matches SYMBOL.

Caveat: Some sas users might initially think to use MERGE with a BY statement, as in
merge have have (firstobs=2 keep=symbol bbp rename=(bbp=bbp1)) have (firstobs=6 keep=symbol bbp rename=(bbp=bbp5));

by symbol;

The problem is that the one-record and 5-record offsets would be preserved for the first symbol, but the moment the second symbol is started the they will no longer be offset as all. Thus you need the slightly tedious coding above to check SYMBOL against _SYMBOL1 and _SYMBOL5, etc.

If you have irregular time series (i.e. holes in the time points), you'll have to give up on the merge statement, and replace it with a collection of SET statements.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

rkdubey84 · Posted 03-03-2017 03:16 AM

Hi @mkeintz!!

Firstly I'd like to thank you and I really appreciate the proactive thinking of yours.

Yes BBP was Best Bid Price for a stock for the minute ineterval 9:15:00 to 9:15:59.

I have seggregated the data stock wise as I need to run my analysis for each stock separately.

The dataset runs from 03Jun2013:09:15:00 to 31AUG2013:15:29:59. Now this brings me to a similar problem as you suggested pertaining to different stocks.

Now for everyday I have dataset from 9:15:00 am to 15:29:59 pm. And hence as the day ends The BBP1 and BBP5 need to be adjusted.

Also as you pointed out about the holes. There are a few (very rare occasional minutes) where trades have not happened at all and hence for that minute there is no BBP and that minute has been skipped from the dataset.

It will be great if you can suggest me somthing in order to tackle this problem.

Thanks a ton.

Ritesh

Reeza · Posted 03-03-2017 03:36 AM

Use proc time series to fill the gaps and use proc expand with a convert statement for the lead. The documentation has an example of convert usage.

rkdubey84 · Posted 03-03-2017 03:46 AM

Hi @Reeza!!
Good Afternoon.

I did try PROC EXPAND earlier but that didn't help much.

I followed a similar code to this one:

proc expand data=your_dataset method=none;
by id;
convert salary = next_row_salary / transformout=(lead 1);
run;

Thanks.

Reeza · Posted 03-03-2017 05:16 AM

You're missing an OUT statement.

rkdubey84 · Posted 03-03-2017 03:38 AM

Hi @mkeintz,

I modified the code accordingly as you suggested and it works perfectly fine. But As mentioned earlier and as you also pointed out... When there are breaks in Time there its taking the next value instead of generating the missing value.

Plz. have a look.

Data AmbujaMQTest; Set AmbujaMQTest;
DT = Datepart(DTM1);
Format DT Date9.;
run;

data want (drop=_:);
  merge AmbujaMQTest
        AmbujaMQTest (firstobs=2 keep=DT bbp rename=(DT=_DT1 bbp=bbp1))
        AmbujaMQTest (firstobs=6 keep=DT bbp rename=(DT=_DT5 bbp=bbp5));
  if _DT5^=DT then do;
    bbp5=.;
    if _DT1^=DT then bbp1=.;
  end;
run;

Data want; Set Want;
Dif = DTM1 - Lag(DTM1);
run;

Ksharp · Posted 03-03-2017 05:30 AM

SQL can take into account the time gap .




data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
proc sql;
create table want as
select *,(select bbp from have where dtm1=a.dtm1+'00:01:00't ) as bbp1,
 (select bbp from have where dtm1=a.dtm1+'00:5:00't ) as bbp5
 from have as a;
quit;

rkdubey84 · Posted 03-03-2017 05:43 AM

Hi @Ksharp, Good Evening!!

Thanks a ton for replying. Really appreciate it. I ran the code and yes it did take the timegap into account.

This is awesome.

Thanks Once Again.

Have a nice day ahead!!

Ritesh

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away