- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Suppose that I have the following dataset:
Time Volume Price
10:02:04 100 100.25
10:02:07 200 100.35
10:02:14 300 100.55
10:02:18 100 100.60
10:02:18 300 100.70
10:02:23 200 100.80
And, I want to get the following dataset
Time Volume Price Diff
10:02:04 100 100.25 -0.0045 = (100.25 - 100.70)/100.25 = (price(t)-price(t+10))/price(t)
10:02:07 200 100.35 -0.0032 = (100.35 - 100.675)/100.35
10:02:14 300 100.55 -0.0012 = (100.55 - 100.675)/100.55
10:02:16 100 100.60 -0.0009 = (100.60 - 100.70)/100.60
10:02:16 300 100.70
So, as you can see, I need to find price change over 10 seconds (p(t) - p(t+10)) and then divide it by p(t). However, there are some issues. First, if I do not have p(t+10), then I need to select the price that is closest to p(t+10). Furthermore, if I have two prices in one second, I need to compute the volume-weighted average price for this second and then use it. For example, in my data, I have an interval of 10:02:07. I do not have a price at 10:02:17 and the closest one is the price at 10:02:16. However, in this interval, I have two prices. Therefore, I am computing the volume-weighted average price for this interval (((100/400)*100.60 + (300/400)*100.70) = 100.675) and then use this one.
Could you please help me with this?
I hope I could explain my question.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Khaladdin If you go for the SQL approach, you have to read from have instead of x like this
proc sql;
create table want as
select *,
(price-(select price from x where time between a.time and a.time+10 having time=max(time)))/price as diff format = 8.4
from have as a;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is time an actual SAS Time value?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also, why is it 100.70 in the first obs: (100.25 - 100.70)/100.25 ?
Shouldn't that be (100.25 - 100.55)/100.25 ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, sorry. It should be 100.55
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also. Is your data grouped by some ID variable?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No, I do not have ID in my data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this
data have;
input Time :time8. Volume Price;
format Time time8.;
datalines;
10:02:04 100 100.25
10:02:07 200 100.35
10:02:14 300 100.55
10:02:16 100 100.60
10:02:16 300 100.70
10:02:23 200 100.80
;
data want(keep = Time Volume Price dif);
if _N_ = 1 then do;
dcl hash h(multidata : "Y");
h.definekey("t");
h.definedata("t", "v", "p");
h.definedone();
dcl hiter i("h");
dcl hash hh();
hh.definekey("t");
hh.definedata("t", "s");
hh.definedone();
do until (z);
set have end = z;
h.add(key : time, data : time, data : Volume, data : Price);
if hh.find(key : Time) ne 0 then s = Volume;
else s + Volume;
hh.replace(key : Time, data : Time, data : s);
end;
end;
set have;
t = .; v = .; p = .; s = .;
aa = constant('big');
if h.find(key : Time + 10) = 0 then dif = (Price - p) / Price;
else do;
do while (i.next() = 0);
if abs((time + 10) - t) < aa then do;
mt = t;
aa = abs((time + 10) - t);
end;
end;
rc = hh.find(key : mt);
do while (h.do_over(key : mt) = 0);
pp = sum(pp, divide(v, s) * p);
end;
dif = (Price - pp) / Price;
end;
format dif 8.4 t mt time8.;
run;
Result:
Time Volume Price dif 10:02:04 100 100.25 -0.0030 10:02:07 200 100.35 -0.0032 10:02:14 300 100.55 -0.0025 10:02:16 100 100.60 -0.0020 10:02:16 300 100.70 -0.0010 10:02:23 200 100.80 0.0000
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It works well in the existing, however, when I extend the data, I have some problems. Let's say, we have the following data:
data have;
input Time :time8. Volume Price;
format Time time8.;
datalines;
10:02:04 100 100.25
10:02:07 200 100.35
10:02:14 300 100.55
10:02:16 100 100.60
10:02:16 300 100.70
10:02:23 200 100.80
10:02:35 100 100.90
10:02:47 300 101.20
10:02:53 200 101.40
10:03:11 100 101.60
10:03:33 300 101.70
10:03:37 200 101.80
10:03:45 100 101.60
10:03:53 300 101.70
10:04:32 200 102.80
;
When I use this method, at 10:03:11, I am getting 0. Normally, it should be (101.60 - 101.70)/101.60. Furthermore, at time 10:03:53, I should get (101.70 - 102.80)/101.70. However, I am getting 0 with this code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok. So we should not be restricted to 'closest within' 10 seconds, correct?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Just made a small correction.
See if this works
data have;
input Time :time8. Volume Price;
format Time time8.;
datalines;
10:02:04 100 100.25
10:02:07 200 100.35
10:02:14 300 100.55
10:02:16 100 100.60
10:02:16 300 100.70
10:02:23 200 100.80
10:02:35 100 100.90
10:02:47 300 101.20
10:02:53 200 101.40
10:03:11 100 101.60
10:03:33 300 101.70
10:03:37 200 101.80
10:03:45 100 101.60
10:03:53 300 101.70
10:04:32 200 102.80
;
data want(keep = Time Volume Price dif);
if _N_ = 1 then do;
dcl hash h(multidata : "Y");
h.definekey("t");
h.definedata("t", "v", "p");
h.definedone();
dcl hiter i("h");
dcl hash hh();
hh.definekey("t");
hh.definedata("t", "s");
hh.definedone();
do until (z);
set have end = z;
h.add(key : time, data : time, data : Volume, data : Price);
if hh.find(key : Time) ne 0 then s = Volume;
else s + Volume;
hh.replace(key : Time, data : Time, data : s);
end;
end;
set have;
t = .; v = .; p = .; s = .;
aa = constant('big');
if h.find(key : Time + 10) = 0 then dif = (Price - p) / Price;
else do;
do while (i.next() = 0);
if abs((time + 10) - t) < aa and Time ne t then do;
mt = t;
aa = abs((time + 10) - t);
end;
end;
rc = hh.find(key : mt);
do while (h.do_over(key : mt) = 0);
pp = sum(pp, divide(v, s) * p);
end;
dif = (Price - pp) / Price;
end;
format dif 8.4 t mt time8.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Really sorry. Still small issue. Normally, if there is a value within 10 seconds, then this value should be used first. However, if no value is available within 10 seconds, then the next one should be used (closest from outside 10 seconds).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have; input Time :time8. Volume Price; format Time time8.; datalines; 10:02:04 100 100.25 10:02:07 200 100.35 10:02:14 300 100.55 10:02:16 100 100.60 10:02:16 300 100.70 10:02:23 200 100.80 ; proc summary data=have nway; class Time ; var Price/weight=Volume; output out=x(drop=_:) mean=; run; proc sql; create table want as select *, (price-(select price from x where time between a.time and a.time+10 having time=max(time)))/price as diff from x as a; quit;