I am running the below code for a data-set containing 708045 Observations for 2 variables. And Its Taking approx 41 minutes to run.
Is it reasonable? Am I doing something wrong?
%Macro HILO(T);
Data &T._ID; Set DA.&T._Final;
Keep DTM1 TradePrice;
run;
proc sql;
create table &T.HILO as
select *,
(select Max(TradePrice) from &T._ID where dtm1=a.dtm1+'00:01:00't ) as HI1,
(select Max(TradePrice) from &T._ID where dtm1=a.dtm1+'00:05:00't ) as HI5,
(select Min(TradePrice) from &T._ID where dtm1=a.dtm1+'00:01:00't ) as LO1,
(select Min(TradePrice) from &T._ID where dtm1=a.dtm1+'00:05:00't ) as LO5,
(select Count(*)from &T._ID where dtm1=a.dtm1+'00:01:00't ) as Count1,
(select Count(*)from &T._ID where a.dtm1 <= dtm1 <=a.dtm1+'00:05:00't ) as Count5
from &T._ID as a;
quit;
Data DA.&T.HILO; Set &T.HILO;
PVol_1min = SQRT(1/Count1*(log(Hi1/Lo1)**2));
PVol_5min = SQRT(1/Count5*(log(Hi5/Lo5)**2));
run;
Data Analysis.&T._Final;
Merge DA.&T.HILO DA.&T._Final;
By DTM1;
run;
proc datasets lib=work
nolist kill;
quit;
run;
%Mend;
%HILO(ACC);
OK. Assuming the interval between two neighbour obs always be one minute.
proc import datafile='/folders/myfolders/acc_test.txt' out=have dbms=tab replace;
run;
proc sql;
create table key as
select dtm1,max(tradeprice) as max,min(tradeprice) as min,count(tradeprice) as count
from have
group by dtm1;
quit;
data want;
if _n_=1 then do;
if 0 then set key;
declare hash h(dataset:'key');
h.definekey('dtm1');
h.definedata('max','min','count');
h.definedone();
end;
set have;
call missing(max,min,count);
k=dtm1+'00:01:00't;
rc=h.find(key:k);
L1=min;H1=max;COUNT1=count;
count5=0;
do i=0 to 5;
call missing(max,min,count);
k=dtm1+'00:01:00't*i;
rc=h.find(key:k);
L5=min;H5=max;COUNT5+count;
end;
drop i k rc max min count;
run;
How fine is your timeseries? I'd start by building the max, min and count values per smallest time unit, sort by time unit, and calculate the -1 and -5 minute times. Then a data step merge where you merge by the different time values would do it.
Hi @Kurt_Bremser,
Thanks for responding. It would be great if you can give me some hint about how to go about building the max, min and count values per smallest time unit, sort by time unit, and calculate the -1 and -5 minute times.
This is how my dataset looks like:
DTM1 TradePrice
6-3-13 9:15 | 1218 |
6-3-13 9:15 | 1218.05 |
6-3-13 9:15 | 1219.55 |
6-3-13 9:15 | 1219.6 |
6-3-13 9:15 | 1219.7 |
6-3-13 9:15 | 1219.7 |
6-3-13 9:15 | 1218 |
6-3-13 9:15 | 1218 |
6-3-13 9:15 | 1218 |
6-3-13 9:15 | 1218 |
6-3-13 9:15 | 1218 |
6-3-13 9:16 | 1219.25 |
6-3-13 9:16 | 1218.05 |
6-3-13 9:16 | 1218.05 |
6-3-13 9:16 | 1218 |
6-3-13 9:16 | 1219 |
6-3-13 9:16 | 1219 |
6-3-13 9:16 | 1219 |
6-3-13 9:16 | 1219.3 |
6-3-13 9:16 | 1218.8 |
6-3-13 9:16 | 1218.25 |
6-3-13 9:16 | 1218.2 |
6-3-13 9:17 | 1218 |
6-3-13 9:17 | 1218 |
6-3-13 9:17 | 1218.5 |
6-3-13 9:17 | 1218.55 |
6-3-13 9:17 | 1218.8 |
Thanks in advance.
Ritesh
If you have a SAS dataset, post it in a data step with datalines. Excel files cannot preserve the structure of SAS datasets.
This is important as I need to know how your datetime values are stored in SAS.
proc sort data=have;
by dtm1;
run;
data have2;
set have (keep=dtm1 tradeprice);
by dtm;
retain count max_price min_price;
if first.dtm1
then do;
count = 0;
max_price = 0;
min_price = 99999999999;
end;
count + 1;
max_price = max(tradeprice,max_price);
min_price = min(tradeprice,min_price);
if last.dtm1
then do;
time1 = intnx('minute',dtm1,-1);
time5 = intnx('minute',dtm1,-5);
output;
end;
drop tradeprice;
run;
data want;
merge
have1 (in=1 drop=time1 time5)
have1 (drop=dtm1 time5 rename=(time1=dtm1 count=count1 max_price=max_price1 min_price=min_price1))
have1 (drop=dtm1 time1 rename=(time5=dtm1 count=count5 max_price=max_price5 min_price=min_price5))
;
by dtm1;
if a;
run;
Untested (no SAS available on weekends).
Please post your log of my first datastep.
Hi @Kurt_Bremser, Thanks a ton for the help.
proc sort data=DA.ACC_Final;
by dtm1;
run;
Data ACC_New; Set DA.ACC_Final;
run;
data ACC_Test;
set DA.ACC_Final (keep=dtm1 tradeprice);
by dtm1;
retain count max_price min_price;
if first.dtm1
then do;
count = 0;
max_price = 0;
min_price = 99999999999;
end;
count + 1;
max_price = max(tradeprice,max_price);
min_price = min(tradeprice,min_price);
if last.dtm1
then do;
time1 = intnx('minute',dtm1,-1);
time5 = intnx('minute',dtm1,-5);
output;
end;
drop tradeprice;
run;
data ACC_HILO;
merge
ACC_New (in=1 drop=time1 time5)
ACC_New (drop=dtm1 time5 rename=(time1=dtm1 count=count1 max_price=max_price1 min_price=min_price1))
ACC_New (drop=dtm1 time1 rename=(time5=dtm1 count=count5 max_price=max_price5 min_price=min_price5))
;
by dtm1;
if a;
run;
Log:
303
304 proc sort data=DA.ACC_Final;
305 by dtm1;
306 run;
NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
307
308 Data ACC_New; Set DA.ACC_Final;
309 run;
NOTE: There were 708045 observations read from the data set DA.ACC_FINAL.
NOTE: The data set WORK.ACC_NEW has 708045 observations and 47 variables.
NOTE: DATA statement used (Total process time):
real time 0.27 seconds
cpu time 0.26 seconds
310
311 data ACC_Test;
312 set DA.ACC_Final (keep=dtm1 tradeprice);
313 by dtm1;
314 retain count max_price min_price;
315 if first.dtm1
316 then do;
317 count = 0;
318 max_price = 0;
319 min_price = 99999999999;
320 end;
321 count + 1;
322 max_price = max(tradeprice,max_price);
323 min_price = min(tradeprice,min_price);
324 if last.dtm1
325 then do;
326 time1 = intnx('minute',dtm1,-1);
327 time5 = intnx('minute',dtm1,-5);
328 output;
329 end;
330 drop tradeprice;
331 run;
NOTE: There were 708045 observations read from the data set DA.ACC_FINAL.
NOTE: The data set WORK.ACC_TEST has 23211 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.17 seconds
cpu time 0.17 seconds
332
333 data ACC_HILO;
334 merge
335 ACC_New (in=1 drop=time1 time5)
-
24
24
ERROR 24-322: Variable name is not valid.
336 ACC_New (drop=dtm1 time5 rename=(time1=dtm1 count=count1 max_price=max_price1
336! min_price=min_price1))
337 ACC_New (drop=dtm1 time1 rename=(time5=dtm1 count=count5 max_price=max_price5
337! min_price=min_price5))
338 ;
ERROR: Variable time1 is not on file WORK.ACC_NEW.
ERROR: Variable count is not on file WORK.ACC_NEW.
ERROR: Variable max_price is not on file WORK.ACC_NEW.
ERROR: Variable min_price is not on file WORK.ACC_NEW.
ERROR: Invalid DROP, KEEP, or RENAME option on file WORK.ACC_NEW.
ERROR: Variable time5 is not on file WORK.ACC_NEW.
ERROR: Variable count is not on file WORK.ACC_NEW.
ERROR: Variable max_price is not on file WORK.ACC_NEW.
ERROR: Variable min_price is not on file WORK.ACC_NEW.
ERROR: Invalid DROP, KEEP, or RENAME option on file WORK.ACC_NEW.
339 by dtm1;
340 if a;
341 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.ACC_HILO may be incomplete. When this step was stopped there were 0
observations and 1 variables.
WARNING: Data set WORK.ACC_HILO was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
Stock price gyrations in 1 an 5 minute windows
https://goo.gl/JaA6wE
https://communities.sas.com/t5/forums/replypage/board-id/programming/message-id/76933
Not sure I created data like yours, maybe this will help?
I doubled the size of your input?
I don't have my 'beast' computer. This was done on an Dell E6400 dual code DDR2 2.2ghz 8gb ram(circa 2010)?
MAX and Mins --> 9.18 seconds
5 minute durations 0.30 seconds
1 minute durations 0.20 seconds
Did not do the rest but they should ve very fast.
I assummed the dtm1 was sorted and used left joins instead of outer joins
HAVE (sequence of minutes)
====
Datetimes in seconds (no loss of generality using seconds)
to 40 obs WORK.T_ID total obs=1,416,090
Obs DTM1 TRADEPRICE
1 1804269812 2
2 1804269816 57
3 1804269817 39
4 1804269818 39
5 1804269819 39
6 1804269820 39
7 1804269824 61
8 1804269828 70
9 1804269832 92
10 1804269836 81 This group is less than 5 minutes apart
11 1804269837 12
12 1804269838 12
13 1804269839 12
14 1804269840 12
15 1804269844 36
16 1804269848 49
17 1804269852 51
18 1804269856 30 5 that are less than or equal to 1 minute apart
19 1804269857 80
20 1804269858 80
21 1804269859 80
22 1804269860 80
23 1804269864 3
24 1804269868 70
25 1804269872 21
WANT
====
Max(TradePrice) where l.dtm1=r.dtm1+'00:05:00't ) as HI5
Min(TradePrice) where l.dtm1=r.dtm1+'00:05:00't ) as LO5
Max(TradePrice) where l.dtm1=r.dtm1+'00:01:00't ) as HI1
Min(TradePrice) where l.dtm1=r.dtm1+'00:01:00't ) as LO1
Up to 40 obs WORK.T_HILO total obs
Obs HI5 LO5 HI1 LO1
1 99 0 99 0
Up to 40 obs WORK.T_5SEC_LAP total obs=1
Obs COUNT5 (counts each of the durrations <=5?)
1 590038
Up to 40 obs WORK.T_1SEC_LAP total obs=1
Obs COUNT1 (counts each of the durrations <=1?)
1 354023
WORKING CODE
============
T_ID as l left join t_id(drop=tradeprice) as r on l.dtm1=r.dtm1+'00:05:00't
* create some data
==================
Data T_ID (sortedby=dtm1 keep=dtm1 tradeprice);
retain dtm1 0;
start=int(datetime());
dtm1=start;
do dtm1= start to start+5*708045 by 4;
tradeprice=int(100*uniform(5731));
savdtm1=dtm1;
if mod(dtm1,1000)=0 then do;
dtm1=savdtm1-3;
output;
dtm1=savdtm1-2;
output;
dtm1=savdtm1-1;
output;
end;
dtm1=savdtm1;
output;
end;
drop start;
run;quit;
MAX/MINS
=========
proc sql;
create table T_HILO as
select
(select Max(TradePrice) from T_ID as l left join t_id(drop=tradeprice) as r on l.dtm1=r.dtm1+'00:05:00't ) as HI5
,(select Min(TradePrice) from T_ID as l left join t_id(drop=tradeprice) as r on l.dtm1=r.dtm1+'00:05:00't ) as LO5
,(select Max(TradePrice) from T_ID as l left join t_id(drop=tradeprice) as r on l.dtm1=r.dtm1+'00:01:00't ) as HI1
,(select Min(TradePrice) from T_ID as l left join t_id(drop=tradeprice) as r on l.dtm1=r.dtm1+'00:01:00't ) as LO1
from
T_ID(obs=1)
;quit;
* Durations
===========
data t_5sec_lap(keep=cnt rename=cnt=count5);
retain beg . cnt 0;
set T_ID end=dne;
if _n_=1 then beg=dtm1;
else do;
if dtm1 - beg <= 5 then do;
cnt=cnt+1;
end;
else do;
beg=dtm1;
end;
end;
if dne then output;
run;quit;
data t_1sec_lap(keep=cnt rename=cnt=count1);;
retain beg . cnt 0;
set T_ID end=dne;
if _n_=1 then beg=dtm1;
else do;
if dtm1 - beg <= 1 then do;
cnt=cnt+1;
end;
else do;
beg=dtm1;
end;
end;
if dne then output;
run;quit;
The take home lesson of this topic is that it is difficult to get PROC SQL to take advantage of the order of observations in a data set, while @Kurt_Bremser's data step code does take advantage of the order.
In order to satisfy such conditions as
where dtm1=a.dtm1+'00:01:00't
sql has to effectively compare every record to every other record, to see if they are exactly 1 minute apart.
Kurt's code, on the other hand, knows that the order of the data set means that once a record is more than 1 minute after a reference record, then there is no need to check the remainder of the data set.
I am sure there are proc sql experts out there that can improve your code, but in my experience the data step approach has always been faster for matching ordered records based on the ordering variables.
proc import datafile='/folders/myfolders/acc_test.txt' out=have dbms=tab replace;
run;
proc sql;
create table key as
select dtm1,max(tradeprice) as max,min(tradeprice) as min,count(tradeprice) as count
from have
group by dtm1;
quit;
data want;
if _n_=1 then do;
if 0 then set key;
declare hash h(dataset:'key');
h.definekey('dtm1');
h.definedata('max','min','count');
h.definedone();
end;
set have;
call missing(max,min,count);
k=dtm1+'00:01:00't;
rc=h.find(key:k);
L1=min;H1=max;COUNT1=count;
call missing(max,min,count);
k=dtm1+'00:05:00't;
rc=h.find(key:k);
L5=min;H5=max;COUNT5=count;
drop k rc max min count;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.