Dear All, Good Afternoon!!
I want to get the lead values based on DTM1 (Time), which is a Unique Key in my data. I want the BBP of next 5th minute and Next 1st minute.
I have a Dataset in the following format:
DTM1 BBP
03JUN2013:09:15:00 164.6
03JUN2013:09:16:00 167
03JUN2013:09:17:00 166
03JUN2013:09:18:00 164.15
03JUN2013:09:19:00 176.1
03JUN2013:09:20:00 169.3
03JUN2013:09:21:00 165
03JUN2013:09:22:00 180.7
03JUN2013:09:23:00 170
03JUN2013:09:24:00 180.3
03JUN2013:09:25:00 176
03JUN2013:09:26:00 175.5
03JUN2013:09:27:00 165.1
03JUN2013:09:28:00 164.3
03JUN2013:09:29:00 170
I want the dataset after processing to be like:
DTM1 BBP BBP1 BBP5
03JUN2013:09:15:00 164.6 167 169.3
03JUN2013:09:16:00 167 166 165
03JUN2013:09:17:00 166 164.15 180.7
03JUN2013:09:18:00 164.15 176.1 170
03JUN2013:09:19:00 176.1 169.3 180.3
03JUN2013:09:20:00 169.3 165 176
03JUN2013:09:21:00 165 180.7 175.5
03JUN2013:09:22:00 180.7 170 165.1
03JUN2013:09:23:00 170 180.3 164.3
03JUN2013:09:24:00 180.3 176 170
03JUN2013:09:25:00 176 175.5 .
03JUN2013:09:26:00 175.5 165.1 .
03JUN2013:09:27:00 165.1 164.3 .
03JUN2013:09:28:00 164.3 170 .
03JUN2013:09:29:00 170 . .
Please Help. Thanks in Advance.
Ritesh
It appears that there are no "holes" in your series. I.e. every minute is recorded, so a five-minute lead is always a 5-record lead.
This makes it simple - a 3-statement program as below. You just have to use the MERGE statement inside a DATA step. This program merges HAVE with two copies of itself. But one copy is offset by 1 observation (that's the FIRSTOBS=2 parameter below) and the other copy is offset by 5 obs (firstobs=6). Note the default is firstobs=1. So as a result you are reading in parallel
obs1, obs2, obs6
then
obs2, obs3, obs7
obs3, obs4, obs8
etc.
The only problem is that each copy of HAVE has the same variables (bbp in your case), but you can't have 3 values simultaneously occupying one var. So the second and third copies of have use the "keep=bbp" parameter to just read in just one var, and also a "rename=" parameter to make sure each merged copy of dataset HAVE has unique var names. If you have several variables (say you also have bbsize, bop, bosize), then make sure to keep them all and rename them all.
data have;
input DTM1 datetime. BBP;
format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00 164.6
03JUN2013:09:16:00 167
03JUN2013:09:17:00 166
03JUN2013:09:18:00 164.15
03JUN2013:09:19:00 176.1
03JUN2013:09:20:00 169.3
03JUN2013:09:21:00 165
03JUN2013:09:22:00 180.7
03JUN2013:09:23:00 170
03JUN2013:09:24:00 180.3
03JUN2013:09:25:00 176
03JUN2013:09:26:00 175.5
03JUN2013:09:27:00 165.1
03JUN2013:09:28:00 164.3
03JUN2013:09:29:00 170
run;
data want;
merge have
have (firstobs=2 keep=bbp rename=(bbp=bbp1))
have (firstobs=6 keep=bbp rename=(bbp=bbp5));
run;
BTW, what's bbp? "best bid price"?
Now you probably have these time series for several stock symbols. Let's say your data is sorted by symbol/dtm1, not just dtm1 as above Then you need a way to prevent the last few records of one symbol from getting "leads" from the first few records of the second symbol. The logic below is to get lead values of symbol as well as bbp. Then if the lead value of symbol does not match current symbol, then set the corresponding bbp to missing:
data want (drop=_:);
merge have
have (firstobs=2 keep=symbol bbp rename=(symbol=_symbol1 bbp=bbp1))
have (firstobs=6 keep=symbol bbp rename=(symbol=_symbol5 bbp=bbp5));
if _symbol5^=symbol then do;
bbp5=.;
if _symbol1^=symbol then bbp1=.;
end;
run;
Note the "drop=_:" parameter tells sas not to output any vars whose name begins with the underscore.
Also note nesting of the
"if _symbol1^=symbol then bbp1=.;"
statement inside a do group is executed only "if _symbol5^=symbol then do;" After all, there's no need to compare _SYMBOL1 against SYMBOL if _SYMBOL5 matches SYMBOL.
Caveat: Some sas users might initially think to use MERGE with a BY statement, as in
merge have have (firstobs=2 keep=symbol bbp rename=(bbp=bbp1)) have (firstobs=6 keep=symbol bbp rename=(bbp=bbp5));
by symbol;
The problem is that the one-record and 5-record offsets would be preserved for the first symbol, but the moment the second symbol is started the they will no longer be offset as all. Thus you need the slightly tedious coding above to check SYMBOL against _SYMBOL1 and _SYMBOL5, etc.
If you have irregular time series (i.e. holes in the time points), you'll have to give up on the merge statement, and replace it with a collection of SET statements.
It appears that there are no "holes" in your series. I.e. every minute is recorded, so a five-minute lead is always a 5-record lead.
This makes it simple - a 3-statement program as below. You just have to use the MERGE statement inside a DATA step. This program merges HAVE with two copies of itself. But one copy is offset by 1 observation (that's the FIRSTOBS=2 parameter below) and the other copy is offset by 5 obs (firstobs=6). Note the default is firstobs=1. So as a result you are reading in parallel
obs1, obs2, obs6
then
obs2, obs3, obs7
obs3, obs4, obs8
etc.
The only problem is that each copy of HAVE has the same variables (bbp in your case), but you can't have 3 values simultaneously occupying one var. So the second and third copies of have use the "keep=bbp" parameter to just read in just one var, and also a "rename=" parameter to make sure each merged copy of dataset HAVE has unique var names. If you have several variables (say you also have bbsize, bop, bosize), then make sure to keep them all and rename them all.
data have;
input DTM1 datetime. BBP;
format dtm1 datetime20.;
datalines;
03JUN2013:09:15:00 164.6
03JUN2013:09:16:00 167
03JUN2013:09:17:00 166
03JUN2013:09:18:00 164.15
03JUN2013:09:19:00 176.1
03JUN2013:09:20:00 169.3
03JUN2013:09:21:00 165
03JUN2013:09:22:00 180.7
03JUN2013:09:23:00 170
03JUN2013:09:24:00 180.3
03JUN2013:09:25:00 176
03JUN2013:09:26:00 175.5
03JUN2013:09:27:00 165.1
03JUN2013:09:28:00 164.3
03JUN2013:09:29:00 170
run;
data want;
merge have
have (firstobs=2 keep=bbp rename=(bbp=bbp1))
have (firstobs=6 keep=bbp rename=(bbp=bbp5));
run;
BTW, what's bbp? "best bid price"?
Now you probably have these time series for several stock symbols. Let's say your data is sorted by symbol/dtm1, not just dtm1 as above Then you need a way to prevent the last few records of one symbol from getting "leads" from the first few records of the second symbol. The logic below is to get lead values of symbol as well as bbp. Then if the lead value of symbol does not match current symbol, then set the corresponding bbp to missing:
data want (drop=_:);
merge have
have (firstobs=2 keep=symbol bbp rename=(symbol=_symbol1 bbp=bbp1))
have (firstobs=6 keep=symbol bbp rename=(symbol=_symbol5 bbp=bbp5));
if _symbol5^=symbol then do;
bbp5=.;
if _symbol1^=symbol then bbp1=.;
end;
run;
Note the "drop=_:" parameter tells sas not to output any vars whose name begins with the underscore.
Also note nesting of the
"if _symbol1^=symbol then bbp1=.;"
statement inside a do group is executed only "if _symbol5^=symbol then do;" After all, there's no need to compare _SYMBOL1 against SYMBOL if _SYMBOL5 matches SYMBOL.
Caveat: Some sas users might initially think to use MERGE with a BY statement, as in
merge have have (firstobs=2 keep=symbol bbp rename=(bbp=bbp1)) have (firstobs=6 keep=symbol bbp rename=(bbp=bbp5));
by symbol;
The problem is that the one-record and 5-record offsets would be preserved for the first symbol, but the moment the second symbol is started the they will no longer be offset as all. Thus you need the slightly tedious coding above to check SYMBOL against _SYMBOL1 and _SYMBOL5, etc.
If you have irregular time series (i.e. holes in the time points), you'll have to give up on the merge statement, and replace it with a collection of SET statements.
Hi @mkeintz!!
Firstly I'd like to thank you and I really appreciate the proactive thinking of yours.
Yes BBP was Best Bid Price for a stock for the minute ineterval 9:15:00 to 9:15:59.
I have seggregated the data stock wise as I need to run my analysis for each stock separately.
The dataset runs from 03Jun2013:09:15:00 to 31AUG2013:15:29:59. Now this brings me to a similar problem as you suggested pertaining to different stocks.
Now for everyday I have dataset from 9:15:00 am to 15:29:59 pm. And hence as the day ends The BBP1 and BBP5 need to be adjusted.
Also as you pointed out about the holes. There are a few (very rare occasional minutes) where trades have not happened at all and hence for that minute there is no BBP and that minute has been skipped from the dataset.
It will be great if you can suggest me somthing in order to tackle this problem.
Thanks a ton.
Ritesh
Use proc time series to fill the gaps and use proc expand with a convert statement for the lead. The documentation has an example of convert usage.
You're missing an OUT statement.
Hi @mkeintz,
I modified the code accordingly as you suggested and it works perfectly fine. But As mentioned earlier and as you also pointed out... When there are breaks in Time there its taking the next value instead of generating the missing value.
Plz. have a look.
Data AmbujaMQTest; Set AmbujaMQTest;
DT = Datepart(DTM1);
Format DT Date9.;
run;
data want (drop=_:);
merge AmbujaMQTest
AmbujaMQTest (firstobs=2 keep=DT bbp rename=(DT=_DT1 bbp=bbp1))
AmbujaMQTest (firstobs=6 keep=DT bbp rename=(DT=_DT5 bbp=bbp5));
if _DT5^=DT then do;
bbp5=.;
if _DT1^=DT then bbp1=.;
end;
run;
Data want; Set Want;
Dif = DTM1 - Lag(DTM1);
run;
SQL can take into account the time gap . data have; input DTM1 datetime. BBP; format dtm1 datetime20.; datalines; 03JUN2013:09:15:00 164.6 03JUN2013:09:16:00 167 03JUN2013:09:17:00 166 03JUN2013:09:18:00 164.15 03JUN2013:09:19:00 176.1 03JUN2013:09:20:00 169.3 03JUN2013:09:21:00 165 03JUN2013:09:22:00 180.7 03JUN2013:09:23:00 170 03JUN2013:09:24:00 180.3 03JUN2013:09:25:00 176 03JUN2013:09:26:00 175.5 03JUN2013:09:27:00 165.1 03JUN2013:09:28:00 164.3 03JUN2013:09:29:00 170 run; proc sql; create table want as select *,(select bbp from have where dtm1=a.dtm1+'00:01:00't ) as bbp1, (select bbp from have where dtm1=a.dtm1+'00:5:00't ) as bbp5 from have as a; quit;
Hi @Ksharp, Good Evening!!
Thanks a ton for replying. Really appreciate it. I ran the code and yes it did take the timegap into account.
This is awesome.
Thanks Once Again.
Have a nice day ahead!!
Ritesh
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.