Obsidian | Level 7

## computing a rolling centered median

Hello sas community!

My issue is the following. I have a large dataset containing ultra high frequency data (tick data), which I want to filter for outliers as suggested in the literature:

01.01.2015:17:12:12.445   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:32.565   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:40.685   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:14:59.895   1.32473                   1.32487
01.01.2015:17:14:59.995   1.86743                   1.97473

An example.csv is attached below. I have already removed many obvious data anomalies and now want to filter for outliers as suggested in the literature (e.g. Barndorff-Nielsen Hansen Lunde Shephard (2009) if any of you are interested).

My specific issue is:

I want to delete all entries for which the so called mid-quote ((RateBid+RateAsk)/2) deviated by more than 10 mean absolute deviations from a rolling centered median (excluding the observation under consideration) of the 50 observations around the one considered (so 25 before and 25 after).  Here to be honest, I cannot figure out how to construct such a measure in sas.

To clarify, I need to compute a "rolling" median - let's call it M - that goes through the sample step by step and is constructed such that:

for given observations e.g. t1, t2,....,t25, tk ,tk+1,...,tk+25 ,  for observation tk the median is only computed of the values (t1-t25 and tk+1 to tk+25). And this has to run through all the observations in the sample. This is to ensure that unusual outliers, that are not in line with surrounding observations are removed, without removing any that might be e.g. the first after a discrete jump.

I hope you can help me with my issue. Thank you very much in advance!

Kind regards

1 ACCEPTED SOLUTION

Accepted Solutions
Tourmaline | Level 20

## Re: computing a rolling centered median

Do something like this

``````data have;
informat Time datetime23.3 seconds_time 10. RateBid 10.8 RateAsk 10.8;
format Time datetime23.3;
input Time RateBid RateAsk seconds_time days;

run;

proc expand data=have out=want;
id time;
convert midquote=rollingmedian / transformout=(cmovmed 50);
run;``````

Whenever you are to compute rolling statistics, think of PROC EXPAND 🙂

6 REPLIES 6
Tourmaline | Level 20

## Re: computing a rolling centered median

Do something like this

``````data have;
informat Time datetime23.3 seconds_time 10. RateBid 10.8 RateAsk 10.8;
format Time datetime23.3;
input Time RateBid RateAsk seconds_time days;

run;

proc expand data=have out=want;
id time;
convert midquote=rollingmedian / transformout=(cmovmed 50);
run;``````

Whenever you are to compute rolling statistics, think of PROC EXPAND 🙂

Obsidian | Level 7

## Re: computing a rolling centered median

I don't know why, but the solution triggers a weird error message for me. All of a sudden sas thinks that memory is insufficient. Don't know why.
Tourmaline | Level 20

## Re: computing a rolling centered median

Can you post your error message?

Obsidian | Level 7

## Re: computing a rolling centered median

Yes, sure:

ERROR: Unable to allocate sufficient memory. At least 1329709K bytes were requested, but only
68786K were available. You must either increase the amount of memory available, or
approach the problem differently.
ERROR: The SAS System stopped processing this step because of insufficient memory.

however, there should not be an issue. I have 16GB RAM and in my orking directories there is more than enough disc space available.

any thoughts?

This is what i get when I run proc options group=memory:

``````Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=0         Specifies a limit on the amount of memory that is available for data
summarization procedures when class variables are active.
MAXMEMQUERY=0     Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0     Specifies a suggested amount of memory that is needed for executable programs
MEMSIZE=2147483648
Specifies the limit on the amount of virtual memory that can be used during a
SAS session.
REALMEMSIZE=0     Specifies the amount of real memory SAS can expect to allocate``````
Tourmaline | Level 20

## Re: computing a rolling centered median

Please mark the post as accepted solution if it solved your problem 🙂

Obsidian | Level 7

## Re: computing a rolling centered median

I will definitely do so. First thing in the morning I will check whether it did!

Thanks for the suggested solution and the friendly reminder!
Discussion stats
• 6 replies
• 1339 views
• 2 likes
• 2 in conversation