BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rkdubey84
Obsidian | Level 7

 Dear All, Good Afternoon!!

 

I want to get the lead values based on DTM1 (Time), which is a Unique Key in my data. I want the BBP of next 5th minute and Next 1st minute. 

 

I have a Dataset in the following format:

 

DTM1                               BBP
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170

 

 

I want the dataset after processing to be like:

 

DTM1                               BBP        BBP1      BBP5
03JUN2013:09:15:00       164.6      167        169.3
03JUN2013:09:16:00       167         166        165
03JUN2013:09:17:00       166         164.15   180.7
03JUN2013:09:18:00       164.15    176.1     170
03JUN2013:09:19:00       176.1      169.3     180.3
03JUN2013:09:20:00       169.3      165        176
03JUN2013:09:21:00       165         180.7     175.5
03JUN2013:09:22:00       180.7      170        165.1
03JUN2013:09:23:00       170         180.3     164.3
03JUN2013:09:24:00       180.3      176        170
03JUN2013:09:25:00       176         175.5        .

03JUN2013:09:26:00       175.5      165.1        .
03JUN2013:09:27:00       165.1      164.3        .
03JUN2013:09:28:00       164.3      170           .
03JUN2013:09:29:00       170            .             .

 

Please Help. Thanks in Advance.

 

Ritesh

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

It appears that there are no "holes" in your series. I.e. every minute is recorded, so a five-minute lead is always a 5-record lead.

 

This makes it simple - a 3-statement program as below.  You just have to use the MERGE statement inside a DATA step.  This program merges HAVE with two copies of itself.  But one copy is offset by 1 observation (that's the FIRSTOBS=2 parameter below) and the other copy is offset by 5 obs (firstobs=6).   Note the default is firstobs=1.   So as a result you are reading in parallel

    obs1,   obs2,  obs6
then
    obs2,  obs3,  obs7
    obs3,  obs4,  obs8

etc.

 

The only problem is that each copy of HAVE has the same variables (bbp in your case), but you can't have 3 values simultaneously occupying one var.  So the second and third copies of have use the "keep=bbp" parameter to just read in just one var, and also a "rename=" parameter to make sure each merged copy of dataset HAVE has unique var names.   If you have several variables (say you also have bbsize, bop, bosize), then make sure to keep them all and rename them all.

 

data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
data want;
  merge have
        have (firstobs=2 keep=bbp rename=(bbp=bbp1))
        have (firstobs=6 keep=bbp rename=(bbp=bbp5));
run;

 

BTW, what's bbp?  "best bid price"?

 

Now you probably have these time series for several stock symbols.  Let's say your data is sorted by symbol/dtm1, not just dtm1 as above Then you need a way to prevent the last few records of one symbol from getting "leads" from the first few records of the second symbol.  The logic below is to get lead values of symbol as well as bbp.  Then if the lead value of symbol does not match current symbol, then set the corresponding bbp to missing:

data want (drop=_:);
  merge have
        have (firstobs=2 keep=symbol bbp rename=(symbol=_symbol1 bbp=bbp1))
        have (firstobs=6 keep=symbol bbp rename=(symbol=_symbol5 bbp=bbp5));
  if _symbol5^=symbol then do;
    bbp5=.;
    if _symbol1^=symbol then bbp1=.;
  end;
run;

Note the "drop=_:"   parameter tells sas not to output any vars whose name begins with the underscore.

 

 

Also note nesting of the

    "if _symbol1^=symbol then bbp1=.;"
statement inside a do group is executed only "if _symbol5^=symbol then do;"  After all, there's no need to compare _SYMBOL1 against SYMBOL if _SYMBOL5 matches SYMBOL.

 

 

Caveat:  Some sas users might initially think to use MERGE with a BY statement, as in
    merge have have (firstobs=2 keep=symbol bbp rename=(bbp=bbp1)) have (firstobs=6 keep=symbol bbp rename=(bbp=bbp5));

    by symbol;

 

The problem is that the one-record and 5-record offsets would be preserved for the first symbol, but the moment the second symbol is started the they will no longer be offset as all.   Thus you need the slightly tedious coding above to check SYMBOL against _SYMBOL1 and _SYMBOL5, etc.

 

 

If you have irregular time series (i.e. holes in the time points), you'll have to give up on the merge statement, and replace it with a collection of SET statements.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
mkeintz
PROC Star

It appears that there are no "holes" in your series. I.e. every minute is recorded, so a five-minute lead is always a 5-record lead.

 

This makes it simple - a 3-statement program as below.  You just have to use the MERGE statement inside a DATA step.  This program merges HAVE with two copies of itself.  But one copy is offset by 1 observation (that's the FIRSTOBS=2 parameter below) and the other copy is offset by 5 obs (firstobs=6).   Note the default is firstobs=1.   So as a result you are reading in parallel

    obs1,   obs2,  obs6
then
    obs2,  obs3,  obs7
    obs3,  obs4,  obs8

etc.

 

The only problem is that each copy of HAVE has the same variables (bbp in your case), but you can't have 3 values simultaneously occupying one var.  So the second and third copies of have use the "keep=bbp" parameter to just read in just one var, and also a "rename=" parameter to make sure each merged copy of dataset HAVE has unique var names.   If you have several variables (say you also have bbsize, bop, bosize), then make sure to keep them all and rename them all.

 

data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
data want;
  merge have
        have (firstobs=2 keep=bbp rename=(bbp=bbp1))
        have (firstobs=6 keep=bbp rename=(bbp=bbp5));
run;

 

BTW, what's bbp?  "best bid price"?

 

Now you probably have these time series for several stock symbols.  Let's say your data is sorted by symbol/dtm1, not just dtm1 as above Then you need a way to prevent the last few records of one symbol from getting "leads" from the first few records of the second symbol.  The logic below is to get lead values of symbol as well as bbp.  Then if the lead value of symbol does not match current symbol, then set the corresponding bbp to missing:

data want (drop=_:);
  merge have
        have (firstobs=2 keep=symbol bbp rename=(symbol=_symbol1 bbp=bbp1))
        have (firstobs=6 keep=symbol bbp rename=(symbol=_symbol5 bbp=bbp5));
  if _symbol5^=symbol then do;
    bbp5=.;
    if _symbol1^=symbol then bbp1=.;
  end;
run;

Note the "drop=_:"   parameter tells sas not to output any vars whose name begins with the underscore.

 

 

Also note nesting of the

    "if _symbol1^=symbol then bbp1=.;"
statement inside a do group is executed only "if _symbol5^=symbol then do;"  After all, there's no need to compare _SYMBOL1 against SYMBOL if _SYMBOL5 matches SYMBOL.

 

 

Caveat:  Some sas users might initially think to use MERGE with a BY statement, as in
    merge have have (firstobs=2 keep=symbol bbp rename=(bbp=bbp1)) have (firstobs=6 keep=symbol bbp rename=(bbp=bbp5));

    by symbol;

 

The problem is that the one-record and 5-record offsets would be preserved for the first symbol, but the moment the second symbol is started the they will no longer be offset as all.   Thus you need the slightly tedious coding above to check SYMBOL against _SYMBOL1 and _SYMBOL5, etc.

 

 

If you have irregular time series (i.e. holes in the time points), you'll have to give up on the merge statement, and replace it with a collection of SET statements.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rkdubey84
Obsidian | Level 7

Hi @mkeintz!! 

Firstly I'd like to thank you and I really appreciate the proactive thinking of yours.

 

Yes BBP was Best Bid Price for a stock for the minute ineterval 9:15:00 to 9:15:59. 

 

I have seggregated the data stock wise as I need to run my analysis for each stock separately.

 

The dataset runs from 03Jun2013:09:15:00 to 31AUG2013:15:29:59. Now this brings me to a similar problem as you suggested pertaining to different stocks. 

 

Now for everyday I have dataset from 9:15:00 am to 15:29:59 pm. And hence as the day ends The BBP1 and BBP5 need to be adjusted.

 

Also as you pointed out about the holes. There are a few (very rare occasional minutes) where trades have not happened at all and hence for that minute there is no BBP and that minute has been skipped from the dataset.

 

It will be great if you can suggest me somthing in order to tackle this problem.

 

Thanks a ton.

Ritesh

Reeza
Super User

Use proc time series to fill the gaps and use proc expand with a convert statement for the lead. The documentation has an example of convert usage. 

rkdubey84
Obsidian | Level 7
Hi @Reeza!!
Good Afternoon.

I did try PROC EXPAND earlier but that didn't help much.

I followed a similar code to this one:


proc expand data=your_dataset method=none;
by id;
convert salary = next_row_salary / transformout=(lead 1);
run;

Thanks.
rkdubey84
Obsidian | Level 7

Hi @mkeintz

 

I modified the code accordingly as you suggested and it works perfectly fine. But As mentioned earlier and as you also pointed out... When there are breaks in Time there its taking the next value instead of generating the missing value.

 

Plz. have a look.

 

Data AmbujaMQTest; Set AmbujaMQTest;
DT = Datepart(DTM1);
Format DT Date9.;
run;

data want (drop=_:);
  merge AmbujaMQTest
        AmbujaMQTest (firstobs=2 keep=DT bbp rename=(DT=_DT1 bbp=bbp1))
        AmbujaMQTest (firstobs=6 keep=DT bbp rename=(DT=_DT5 bbp=bbp5));
  if _DT5^=DT then do;
    bbp5=.;
    if _DT1^=DT then bbp1=.;
  end;
run;

Data want; Set Want;
Dif = DTM1 - Lag(DTM1);
run;
Ksharp
Super User
SQL can take into account the time gap .




data have;
  input DTM1 datetime.  BBP;
  format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00       164.6
03JUN2013:09:16:00       167
03JUN2013:09:17:00       166
03JUN2013:09:18:00       164.15
03JUN2013:09:19:00       176.1
03JUN2013:09:20:00       169.3
03JUN2013:09:21:00       165
03JUN2013:09:22:00       180.7
03JUN2013:09:23:00       170
03JUN2013:09:24:00       180.3
03JUN2013:09:25:00       176
03JUN2013:09:26:00       175.5
03JUN2013:09:27:00       165.1
03JUN2013:09:28:00       164.3
03JUN2013:09:29:00       170
run;
proc sql;
create table want as
select *,(select bbp from have where dtm1=a.dtm1+'00:01:00't ) as bbp1,
 (select bbp from have where dtm1=a.dtm1+'00:5:00't ) as bbp5
 from have as a;
quit;

rkdubey84
Obsidian | Level 7

Hi @Ksharp, Good Evening!!

 

Thanks a ton for replying. Really appreciate it. I ran the code and yes it did take the timegap into account.

 

This is awesome. Man Happy

 

Thanks Once Again. 

Have a nice day ahead!!

 

Ritesh

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1006 views
  • 4 likes
  • 4 in conversation