SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

computing a rolling centered median

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

computing a rolling centered median

Hello sas community!

 

My issue is the following. I have a large dataset containing ultra high frequency data (tick data), which I want to filter for outliers as suggested in the literature:

 

Time                                     RateBid               RateAsk .....
01.01.2015:17:12:12.445   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:32.565   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:40.685   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:14:59.895   1.32473                   1.32487
01.01.2015:17:14:59.995   1.86743                   1.97473

 

An example.csv is attached below. I have already removed many obvious data anomalies and now want to filter for outliers as suggested in the literature (e.g. Barndorff-Nielsen Hansen Lunde Shephard (2009) if any of you are interested).

 

My specific issue is:

I want to delete all entries for which the so called mid-quote ((RateBid+RateAsk)/2) deviated by more than 10 mean absolute deviations from a rolling centered median (excluding the observation under consideration) of the 50 observations around the one considered (so 25 before and 25 after).  Here to be honest, I cannot figure out how to construct such a measure in sas.

 

To clarify, I need to compute a "rolling" median - let's call it M - that goes through the sample step by step and is constructed such that:

for given observations e.g. t1, t2,....,t25, tk ,tk+1,...,tk+25 ,  for observation tk the median is only computed of the values (t1-t25 and tk+1 to tk+25). And this has to run through all the observations in the sample. This is to ensure that unusual outliers, that are not in line with surrounding observations are removed, without removing any that might be e.g. the first after a discrete jump.

 

I hope you can help me with my issue. Thank you very much in advance!

 

Kind regards


Accepted Solutions
Solution
‎04-26-2018 11:59 AM
PROC Star
Posts: 1,209

Re: computing a rolling centered median

Posted in reply to NewSASuser2018

Do something like this

 

data have;
   infile "c:\Users\Peter\Downloads\example.csv" firstobs=2 dlm=",";
   informat Time datetime23.3 seconds_time 10. RateBid 10.8 RateAsk 10.8;
   format Time datetime23.3;
   input Time RateBid RateAsk seconds_time days;

   midquote=(RateBid+RateAsk)/2;
run;

proc expand data=have out=want;
   id time;
   convert midquote=rollingmedian / transformout=(cmovmed 50);
run;

Whenever you are to compute rolling statistics, think of PROC EXPAND Smiley Happy

View solution in original post


All Replies
Solution
‎04-26-2018 11:59 AM
PROC Star
Posts: 1,209

Re: computing a rolling centered median

Posted in reply to NewSASuser2018

Do something like this

 

data have;
   infile "c:\Users\Peter\Downloads\example.csv" firstobs=2 dlm=",";
   informat Time datetime23.3 seconds_time 10. RateBid 10.8 RateAsk 10.8;
   format Time datetime23.3;
   input Time RateBid RateAsk seconds_time days;

   midquote=(RateBid+RateAsk)/2;
run;

proc expand data=have out=want;
   id time;
   convert midquote=rollingmedian / transformout=(cmovmed 50);
run;

Whenever you are to compute rolling statistics, think of PROC EXPAND Smiley Happy

Contributor
Posts: 22

Re: computing a rolling centered median

I don't know why, but the solution triggers a weird error message for me. All of a sudden sas thinks that memory is insufficient. Don't know why.
PROC Star
Posts: 1,209

Re: computing a rolling centered median

Posted in reply to NewSASuser2018

Can you post your error message?

Contributor
Posts: 22

Re: computing a rolling centered median

Yes, sure:

ERROR: Unable to allocate sufficient memory. At least 1329709K bytes were requested, but only
       68786K were available. You must either increase the amount of memory available, or
       approach the problem differently.
ERROR: The SAS System stopped processing this step because of insufficient memory.

however, there should not be an issue. I have 16GB RAM and in my orking directories there is more than enough disc space available.

any thoughts?

 

This is what i get when I run proc options group=memory:

 

Group=MEMORY
 SORTSIZE=1073741824
                   Specifies the amount of memory that is available to the SORT procedure.
 SUMSIZE=0         Specifies a limit on the amount of memory that is available for data
                   summarization procedures when class variables are active.
 MAXMEMQUERY=0     Specifies the maximum amount of memory that is allocated for procedures.
 MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
 MEMMAXSZ=2147483648
                   Specifies the maximum amount of memory to allocate for using memory-based
                   libraries.
 LOADMEMSIZE=0     Specifies a suggested amount of memory that is needed for executable programs
                   loaded by SAS.
 MEMSIZE=2147483648
                   Specifies the limit on the amount of virtual memory that can be used during a
                   SAS session.
 REALMEMSIZE=0     Specifies the amount of real memory SAS can expect to allocate
PROC Star
Posts: 1,209

Re: computing a rolling centered median

Posted in reply to NewSASuser2018

Please mark the post as accepted solution if it solved your problem Smiley Happy

Contributor
Posts: 22

Re: computing a rolling centered median

I will definitely do so. First thing in the morning I will check whether it did!

Thanks for the suggested solution and the friendly reminder!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 239 views
  • 2 likes
  • 2 in conversation