DATA Step, Macro, Functions and more

Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

Accepted Solution Solved
Reply
Contributor
Posts: 36
Accepted Solution

Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

 Dear All, Good Afternoon!!

 

I want to get the lead values based on DTM1 (Time), which is a Unique Key in my data. I want the BBP of next 5th minute and Next 1st minute. 

 

I have a Dataset in the following format:

 

DTM1                               BBP
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170

 

 

I want the dataset after processing to be like:

 

DTM1                               BBP        BBP1      BBP5
03JUN2013:09:15:00       164.6      167        169.3
03JUN2013:09:16:00       167         166        165
03JUN2013:09:17:00       166         164.15   180.7
03JUN2013:09:18:00       164.15    176.1     170
03JUN2013:09:19:00       176.1      169.3     180.3
03JUN2013:09:20:00       169.3      165        176
03JUN2013:09:21:00       165         180.7     175.5
03JUN2013:09:22:00       180.7      170        165.1
03JUN2013:09:23:00       170         180.3     164.3
03JUN2013:09:24:00       180.3      176        170
03JUN2013:09:25:00       176         175.5        .

03JUN2013:09:26:00       175.5      165.1        .
03JUN2013:09:27:00       165.1      164.3        .
03JUN2013:09:28:00       164.3      170           .
03JUN2013:09:29:00       170            .             .

 

Please Help. Thanks in Advance.

 

Ritesh


Accepted Solutions
Solution
‎03-03-2017 03:46 AM
Valued Guide
Posts: 797

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

[ Edited ]

It appears that there are no "holes" in your series. I.e. every minute is recorded, so a five-minute lead is always a 5-record lead.

 

This makes it simple - a 3-statement program as below.  You just have to use the MERGE statement inside a DATA step.  This program merges HAVE with two copies of itself.  But one copy is offset by 1 observation (that's the FIRSTOBS=2 parameter below) and the other copy is offset by 5 obs (firstobs=6).   Note the default is firstobs=1.   So as a result you are reading in parallel

    obs1,   obs2,  obs6
then
    obs2,  obs3,  obs7
    obs3,  obs4,  obs8

etc.

 

The only problem is that each copy of HAVE has the same variables (bbp in your case), but you can't have 3 values simultaneously occupying one var.  So the second and third copies of have use the "keep=bbp" parameter to just read in just one var, and also a "rename=" parameter to make sure each merged copy of dataset HAVE has unique var names.   If you have several variables (say you also have bbsize, bop, bosize), then make sure to keep them all and rename them all.

 

data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
data want;
  merge have
        have (firstobs=2 keep=bbp rename=(bbp=bbp1))
        have (firstobs=6 keep=bbp rename=(bbp=bbp5));
run;

 

BTW, what's bbp?  "best bid price"?

 

Now you probably have these time series for several stock symbols.  Let's say your data is sorted by symbol/dtm1, not just dtm1 as above Then you need a way to prevent the last few records of one symbol from getting "leads" from the first few records of the second symbol.  The logic below is to get lead values of symbol as well as bbp.  Then if the lead value of symbol does not match current symbol, then set the corresponding bbp to missing:

data want (drop=_:);
  merge have
        have (firstobs=2 keep=symbol bbp rename=(symbol=_symbol1 bbp=bbp1))
        have (firstobs=6 keep=symbol bbp rename=(symbol=_symbol5 bbp=bbp5));
  if _symbol5^=symbol then do;
    bbp5=.;
    if _symbol1^=symbol then bbp1=.;
  end;
run;

Note the "drop=_:"   parameter tells sas not to output any vars whose name begins with the underscore.

 

 

Also note nesting of the

    "if _symbol1^=symbol then bbp1=.;"
statement inside a do group is executed only "if _symbol5^=symbol then do;"  After all, there's no need to compare _SYMBOL1 against SYMBOL if _SYMBOL5 matches SYMBOL.

 

 

Caveat:  Some sas users might initially think to use MERGE with a BY statement, as in
    merge have have (firstobs=2 keep=symbol bbp rename=(bbp=bbp1)) have (firstobs=6 keep=symbol bbp rename=(bbp=bbp5));

    by symbol;

 

The problem is that the one-record and 5-record offsets would be preserved for the first symbol, but the moment the second symbol is started the they will no longer be offset as all.   Thus you need the slightly tedious coding above to check SYMBOL against _SYMBOL1 and _SYMBOL5, etc.

 

 

If you have irregular time series (i.e. holes in the time points), you'll have to give up on the merge statement, and replace it with a collection of SET statements.

View solution in original post


All Replies
Solution
‎03-03-2017 03:46 AM
Valued Guide
Posts: 797

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

[ Edited ]

It appears that there are no "holes" in your series. I.e. every minute is recorded, so a five-minute lead is always a 5-record lead.

 

This makes it simple - a 3-statement program as below.  You just have to use the MERGE statement inside a DATA step.  This program merges HAVE with two copies of itself.  But one copy is offset by 1 observation (that's the FIRSTOBS=2 parameter below) and the other copy is offset by 5 obs (firstobs=6).   Note the default is firstobs=1.   So as a result you are reading in parallel

    obs1,   obs2,  obs6
then
    obs2,  obs3,  obs7
    obs3,  obs4,  obs8

etc.

 

The only problem is that each copy of HAVE has the same variables (bbp in your case), but you can't have 3 values simultaneously occupying one var.  So the second and third copies of have use the "keep=bbp" parameter to just read in just one var, and also a "rename=" parameter to make sure each merged copy of dataset HAVE has unique var names.   If you have several variables (say you also have bbsize, bop, bosize), then make sure to keep them all and rename them all.

 

data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
data want;
  merge have
        have (firstobs=2 keep=bbp rename=(bbp=bbp1))
        have (firstobs=6 keep=bbp rename=(bbp=bbp5));
run;

 

BTW, what's bbp?  "best bid price"?

 

Now you probably have these time series for several stock symbols.  Let's say your data is sorted by symbol/dtm1, not just dtm1 as above Then you need a way to prevent the last few records of one symbol from getting "leads" from the first few records of the second symbol.  The logic below is to get lead values of symbol as well as bbp.  Then if the lead value of symbol does not match current symbol, then set the corresponding bbp to missing:

data want (drop=_:);
  merge have
        have (firstobs=2 keep=symbol bbp rename=(symbol=_symbol1 bbp=bbp1))
        have (firstobs=6 keep=symbol bbp rename=(symbol=_symbol5 bbp=bbp5));
  if _symbol5^=symbol then do;
    bbp5=.;
    if _symbol1^=symbol then bbp1=.;
  end;
run;

Note the "drop=_:"   parameter tells sas not to output any vars whose name begins with the underscore.

 

 

Also note nesting of the

    "if _symbol1^=symbol then bbp1=.;"
statement inside a do group is executed only "if _symbol5^=symbol then do;"  After all, there's no need to compare _SYMBOL1 against SYMBOL if _SYMBOL5 matches SYMBOL.

 

 

Caveat:  Some sas users might initially think to use MERGE with a BY statement, as in
    merge have have (firstobs=2 keep=symbol bbp rename=(bbp=bbp1)) have (firstobs=6 keep=symbol bbp rename=(bbp=bbp5));

    by symbol;

 

The problem is that the one-record and 5-record offsets would be preserved for the first symbol, but the moment the second symbol is started the they will no longer be offset as all.   Thus you need the slightly tedious coding above to check SYMBOL against _SYMBOL1 and _SYMBOL5, etc.

 

 

If you have irregular time series (i.e. holes in the time points), you'll have to give up on the merge statement, and replace it with a collection of SET statements.

Contributor
Posts: 36

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

[ Edited ]

Hi @mkeintz!! 

Firstly I'd like to thank you and I really appreciate the proactive thinking of yours.

 

Yes BBP was Best Bid Price for a stock for the minute ineterval 9:15:00 to 9:15:59. 

 

I have seggregated the data stock wise as I need to run my analysis for each stock separately.

 

The dataset runs from 03Jun2013:09:15:00 to 31AUG2013:15:29:59. Now this brings me to a similar problem as you suggested pertaining to different stocks. 

 

Now for everyday I have dataset from 9:15:00 am to 15:29:59 pm. And hence as the day ends The BBP1 and BBP5 need to be adjusted.

 

Also as you pointed out about the holes. There are a few (very rare occasional minutes) where trades have not happened at all and hence for that minute there is no BBP and that minute has been skipped from the dataset.

 

It will be great if you can suggest me somthing in order to tackle this problem.

 

Thanks a ton.

Ritesh

Super User
Posts: 17,784

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

Use proc time series to fill the gaps and use proc expand with a convert statement for the lead. The documentation has an example of convert usage. 

Contributor
Posts: 36

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

Hi @Reeza!!
Good Afternoon.

I did try PROC EXPAND earlier but that didn't help much.

I followed a similar code to this one:


proc expand data=your_dataset method=none;
by id;
convert salary = next_row_salary / transformout=(lead 1);
run;

Thanks.
Super User
Posts: 17,784

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

You're missing an OUT statement. 

Contributor
Posts: 36

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

Hi @mkeintz

 

I modified the code accordingly as you suggested and it works perfectly fine. But As mentioned earlier and as you also pointed out... When there are breaks in Time there its taking the next value instead of generating the missing value.

 

Plz. have a look.

 

Data AmbujaMQTest; Set AmbujaMQTest;
DT = Datepart(DTM1);
Format DT Date9.;
run;

data want (drop=_:);
  merge AmbujaMQTest
        AmbujaMQTest (firstobs=2 keep=DT bbp rename=(DT=_DT1 bbp=bbp1))
        AmbujaMQTest (firstobs=6 keep=DT bbp rename=(DT=_DT5 bbp=bbp5));
  if _DT5^=DT then do;
    bbp5=.;
    if _DT1^=DT then bbp1=.;
  end;
run;

Data want; Set Want;
Dif = DTM1 - Lag(DTM1);
run;
Super User
Posts: 9,671

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

[ Edited ]
SQL can take into account the time gap .




data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
proc sql;
create table want as
select *,(select bbp from have where dtm1=a.dtm1+'00:01:00't ) as bbp1,
 (select bbp from have where dtm1=a.dtm1+'00:5:00't ) as bbp5
 from have as a;
quit;

Contributor
Posts: 36

Re: Getting Lead Values based on time (Next 1 Minute, Next 5 Minute)

Hi @Ksharp, Good Evening!!

 

Thanks a ton for replying. Really appreciate it. I ran the code and yes it did take the timegap into account.

 

This is awesome. Man Happy

 

Thanks Once Again. 

Have a nice day ahead!!

 

Ritesh

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 154 views
  • 4 likes
  • 4 in conversation