Overlapping rolling window by 2 seconds in large dataset

Reply
Occasional Contributor
Posts: 5

Overlapping rolling window by 2 seconds in large dataset

I appreciate any help in setting up a program in sas where I can measure price changes in an interval of 2 seconds using a large dataset. SQL is taking a lot of time. The thing is for each 2 seconds, I determine the price changes at any point of time (let's say pt), and then take the starting price of the interval, and the ending price of the interval (around the price changes), to compute the difference between the prices and compute some ratio out of these differences. The computation is done in an overlapping window of 2 seconds. Thanks.  

Super User
Super User
Posts: 7,691

Re: Overlapping rolling window by 2 seconds in large dataset

Please follow the guidance next to the post a question button.  Post test data in the form of a datastep, and what the output should look like.  Should straight forward to have a retained variable, which when > 00:02 difference flips over to 0 again, but need to see some data.

Occasional Contributor
Posts: 5

Re: Overlapping rolling window by 2 seconds in large dataset

Thank you very much for your reply. I have attached a sample of my data. I have included a small sample of a given date between 8:47:00 and 10:00:00. "Changes" variable is when the price changes. Whenever price changes, I will take the price at the time of changes, let's say Pt.

I need to take two prices at different times of the same interval of 2 seconds: one second before price changes (StartPrice) and one second after price changes (EndPrice). I then compute three differences: |StartPrice-Pt| and |EndPrice-Pt| and |StartPrice-EndPrice|. These three values are used then to compute a certain ratio. This computation needs to be done in an overalpping window of 2 seconds, across several days. Please let me know if it is clear. 

Super User
Posts: 11,105

Re: Overlapping rolling window by 2 seconds in large dataset

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Occasional Contributor
Posts: 5

Re: Overlapping rolling window by 2 seconds in large dataset

Thank you for the clarification. I am new to this. Please find attached the txt file of sample data, as I don't have the SAS EU. Please let me know if this works. Thanks. 

Super User
Posts: 11,105

Re: Overlapping rolling window by 2 seconds in large dataset

[ Edited ]

Here's what an example datastep of data would look like;

data have;
   informat date anydtdte. time time11.3 price changes best10.    ;
   format date mmddyy10. time time11.3;
   input Date	Time	Price	changes;
   dttme = dhms(date,0,0,time);
   format dttme datetime22.3;
datalines;
15-OCT-2014	8:47:00.041	129.546875	.
15-OCT-2014	8:47:00.041	129.546875	0
15-OCT-2014	8:47:00.041	129.546875	0
15-OCT-2014	8:47:00.041	129.546875	0
15-OCT-2014	8:47:00.041	129.546875	0
15-OCT-2014	8:47:00.041	129.546875	0
15-OCT-2014	8:47:00.041	129.5625	   0.015625
15-OCT-2014	8:47:00.048	129.546875	-0.015625
15-OCT-2014	8:47:00.048	129.546875	0
15-OCT-2014	8:47:00.048	129.546875	0
15-OCT-2014	8:47:00.144	129.546875	0
15-OCT-2014	8:47:00.160	129.546875	0
15-OCT-2014	8:47:00.164	129.546875	0
15-OCT-2014	8:47:00.169	129.546875	0
15-OCT-2014	8:47:00.184	129.546875	0
15-OCT-2014	8:47:00.184	129.546875	0
15-OCT-2014	8:47:00.184	129.546875	0
15-OCT-2014	8:47:00.371	129.546875	0
15-OCT-2014	8:47:00.371	129.53125	-0.015625
15-OCT-2014	8:47:00.371	129.546875	0.015625
15-OCT-2014	8:47:00.371	129.546875	0
15-OCT-2014	8:47:00.910	129.546875	0
15-OCT-2014	8:47:00.910	129.546875	0
15-OCT-2014	8:47:00.910	129.546875	0
15-OCT-2014	8:47:00.910	129.546875	0
15-OCT-2014	8:47:00.913	129.546875	0
15-OCT-2014	8:47:00.913	129.546875	0
15-OCT-2014	8:47:00.913	129.546875	0
15-OCT-2014	8:47:00.913	129.546875	0
15-OCT-2014	8:47:00.920	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.017	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.020	129.546875	0
15-OCT-2014	8:47:01.029	129.546875	0
15-OCT-2014	8:47:01.032	129.546875	0
15-OCT-2014	8:47:01.200	129.546875	0
15-OCT-2014	8:47:01.200	129.53125	-0.015625
15-OCT-2014	8:47:01.986	129.546875	0.015625
15-OCT-2014	8:47:01.986	129.546875	0
15-OCT-2014	8:47:01.986	129.546875	0
15-OCT-2014	8:47:01.986	129.546875	0
15-OCT-2014	8:47:01.993	129.546875	0
15-OCT-2014	8:47:01.993	129.546875	0
15-OCT-2014	8:47:01.993	129.546875	0
15-OCT-2014	8:47:01.993	129.546875	0
15-OCT-2014	8:47:01.997	129.546875	0
15-OCT-2014	8:47:01.997	129.546875	0
15-OCT-2014	8:47:01.997	129.546875	0
15-OCT-2014	8:47:01.997	129.546875	0
15-OCT-2014	8:47:01.997	129.546875	0
15-OCT-2014	8:47:01.997	129.546875	0
15-OCT-2014	8:47:01.997	129.546875	0
15-OCT-2014	8:47:02.046	129.546875	0
15-OCT-2014	8:47:02.046	129.546875	0
15-OCT-2014	8:47:02.046	129.546875	0
15-OCT-2014	8:47:02.046	129.546875	0
15-OCT-2014	8:47:02.325	129.546875	0
15-OCT-2014	8:47:02.325	129.546875	0
15-OCT-2014	8:47:02.329	129.53125	-0.015625
15-OCT-2014	8:47:02.329	129.546875	0.015625
15-OCT-2014	8:47:02.552	129.546875	0
15-OCT-2014	8:47:02.552	129.546875	0
15-OCT-2014	8:47:02.552	129.546875	0
15-OCT-2014	8:47:02.552	129.546875	0
15-OCT-2014	8:47:02.552	129.546875	0
15-OCT-2014	8:47:02.552	129.546875	0
15-OCT-2014	8:47:02.552	129.546875	0
15-OCT-2014	8:47:02.552	129.546875	0
15-OCT-2014	8:47:02.552	129.546875	0
;
run;

I added a datetime variable to hold the date and time in one as when your data crosses midnight it is much easier to use one variable, the dttme, than all the coding to go around midnight to get intervals.

 

So hopefully you can show us 1) how/what you want to indicate as the start of an interval, 2) using the values above indicate the appropriate range for that start interval, and 3) manually calculate some of the values you re indicating.

 

Since your example data

15-OCT-2014 8:47:00.371 129.546875 0

15-OCT-2014 8:47:00.371 129.53125 -0.015625

15-OCT-2014 8:47:00.371 129.546875 0.015625

15-OCT-2014 8:47:00.371 129.546875 0

shows for time values identical to one thousandth of a second you may have and issue with tie breaking as the cumulative change shown for that time point is 0. I would guess that the program capturing the data is not prepared to report on the actual time or the output field is set up to round.

Occasional Contributor
Posts: 5

Re: Overlapping rolling window by 2 seconds in large dataset

Many thanks for the clarification and the explanation. I appreciate this. Please note that this data is in millisecond, and the changes in price are not cumulative. We migt have frequently observations at quite similar time. Everytime prices change, I consider this as an event. I then need to determine the StartPrice in the one-second before the price change and the EndPrice in the one-second following the price change. Since the time is in millisecond, I rarely have exactly 1 second before and after observations. So I need to consider the last price within one-second interval period around the price change. For example, the price change at 8:47:01:200 occurs when the price "pt" was 129.53125. For this price change, the StartPrice is determined within the 1-second interval before, i.e. 129.546875 around 8:47:00:200 (8:47:00.184) and the EndPrice is determined one-second after, i.e. 129.456875 around 8:47:02:200 (8:47:02.046).   

 

For every price change I need to determine this in an overlapping window. I then compute three differences: D1=abs(StartPrice-Pt); D2=abs(EndPrice-Pt); D3=abs(StratPrice-EndPrice). Many thanks. 

Super User
Posts: 11,105

Re: Overlapping rolling window by 2 seconds in large dataset

If you are only looking at your "event" then perhaps a first step is reducing the data to mostly events such as with:

data intermediate;
   set have;
   changedif = dif(changes);
   if changedif ne 0;
   drop changedif;
run;

Since your data is so long you might use a reduced set such as above to actually show more that a couple seconds and then the by-hand calculations.

 

One concern is when you say "overlapping" intervals that could mean that you need to create a bunch of variables but without a more concrete example what those variables actual results may look like isn't clear, at least to me.

 

If you don't use SAS datetime variables much you many not know that they are number (and fractions) of seconds. So there are functions and simple arithmetic to deal with intervals.

 

The concern about your repeated time values is when you have 2 or more values with the same timestamp how do you pick which one is the StartPrice and Endprice? They will all end up at the same interval difference.

 

Consider (using very nice values) that you have a price of 10 at time 8:00:00.000 and you have 3 values of price, 9.5,10.1 and 9.8 at time 7:59:59.000. Which of the values at the 7:59:59 time is the start price? 

Occasional Contributor
Posts: 5

Re: Overlapping rolling window by 2 seconds in large dataset

Many thanks for the swift response. At many occasions, I have price changes consecutively and randomly. I cannot do the analysis by regular intervals [1,2], [3,4], but by overlapping windows [1,3], [2,4].

In your example, my start price would be 9.5 - the first price submitted at that time (this is usual way of doing things in the field); the endprice will be the last one submitted at that time. There is no significant changes in prices, as shown in the data. Sometimes, StartPrice is equal to the EndPrice, as prices reverse quickly. Please let me know if this is clearer. Thanks. 

Ask a Question
Discussion stats
  • 8 replies
  • 134 views
  • 0 likes
  • 3 in conversation