BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NewSASuser2018
Obsidian | Level 7

Hello sas community!

 

My issue is the following. I have a large dataset containing ultra high frequency data (tick data), which I want to filter for outliers as suggested in the literature:

 

Time                                     RateBid               RateAsk .....
01.01.2015:17:12:12.445   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:32.565   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:13:40.685   xxxxxxxxxx              xxxxxxxxx
01.01.2015:17:14:59.895   1.32473                   1.32487
01.01.2015:17:14:59.995   1.86743                   1.97473

 

An example.csv is attached below. I have already removed many obvious data anomalies and now want to filter for outliers as suggested in the literature (e.g. Barndorff-Nielsen Hansen Lunde Shephard (2009) if any of you are interested).

 

My specific issue is:

I want to delete all entries for which the so called mid-quote ((RateBid+RateAsk)/2) deviated by more than 10 mean absolute deviations from a rolling centered median (excluding the observation under consideration) of the 50 observations around the one considered (so 25 before and 25 after).  Here to be honest, I cannot figure out how to construct such a measure in sas.

 

To clarify, I need to compute a "rolling" median - let's call it M - that goes through the sample step by step and is constructed such that:

for given observations e.g. t1, t2,....,t25, tk ,tk+1,...,tk+25 ,  for observation tk the median is only computed of the values (t1-t25 and tk+1 to tk+25). And this has to run through all the observations in the sample. This is to ensure that unusual outliers, that are not in line with surrounding observations are removed, without removing any that might be e.g. the first after a discrete jump.

 

I hope you can help me with my issue. Thank you very much in advance!

 

Kind regards

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Do something like this

 

data have;
   infile "c:\Users\Peter\Downloads\example.csv" firstobs=2 dlm=",";
   informat Time datetime23.3 seconds_time 10. RateBid 10.8 RateAsk 10.8;
   format Time datetime23.3;
   input Time RateBid RateAsk seconds_time days;

   midquote=(RateBid+RateAsk)/2;
run;

proc expand data=have out=want;
   id time;
   convert midquote=rollingmedian / transformout=(cmovmed 50);
run;

Whenever you are to compute rolling statistics, think of PROC EXPAND 🙂

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

Do something like this

 

data have;
   infile "c:\Users\Peter\Downloads\example.csv" firstobs=2 dlm=",";
   informat Time datetime23.3 seconds_time 10. RateBid 10.8 RateAsk 10.8;
   format Time datetime23.3;
   input Time RateBid RateAsk seconds_time days;

   midquote=(RateBid+RateAsk)/2;
run;

proc expand data=have out=want;
   id time;
   convert midquote=rollingmedian / transformout=(cmovmed 50);
run;

Whenever you are to compute rolling statistics, think of PROC EXPAND 🙂

NewSASuser2018
Obsidian | Level 7
I don't know why, but the solution triggers a weird error message for me. All of a sudden sas thinks that memory is insufficient. Don't know why.
PeterClemmensen
Tourmaline | Level 20

Can you post your error message?

NewSASuser2018
Obsidian | Level 7

Yes, sure:

ERROR: Unable to allocate sufficient memory. At least 1329709K bytes were requested, but only
       68786K were available. You must either increase the amount of memory available, or
       approach the problem differently.
ERROR: The SAS System stopped processing this step because of insufficient memory.

however, there should not be an issue. I have 16GB RAM and in my orking directories there is more than enough disc space available.

any thoughts?

 

This is what i get when I run proc options group=memory:

 

Group=MEMORY
 SORTSIZE=1073741824
                   Specifies the amount of memory that is available to the SORT procedure.
 SUMSIZE=0         Specifies a limit on the amount of memory that is available for data
                   summarization procedures when class variables are active.
 MAXMEMQUERY=0     Specifies the maximum amount of memory that is allocated for procedures.
 MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
 MEMMAXSZ=2147483648
                   Specifies the maximum amount of memory to allocate for using memory-based
                   libraries.
 LOADMEMSIZE=0     Specifies a suggested amount of memory that is needed for executable programs
                   loaded by SAS.
 MEMSIZE=2147483648
                   Specifies the limit on the amount of virtual memory that can be used during a
                   SAS session.
 REALMEMSIZE=0     Specifies the amount of real memory SAS can expect to allocate
PeterClemmensen
Tourmaline | Level 20

Please mark the post as accepted solution if it solved your problem 🙂

NewSASuser2018
Obsidian | Level 7
I will definitely do so. First thing in the morning I will check whether it did!

Thanks for the suggested solution and the friendly reminder!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1339 views
  • 2 likes
  • 2 in conversation