Hello sas community!
My issue is the following. I have a large dataset containing ultra high frequency data (tick data), which I want to filter for outliers as suggested in the literature:
An example.csv is attached below. I have already removed many obvious data anomalies and now want to filter for outliers as suggested in the literature (e.g. Barndorff-Nielsen Hansen Lunde Shephard (2009) if any of you are interested).
My specific issue is:
I want to delete all entries for which the so called mid-quote ((RateBid+RateAsk)/2) deviated by more than 10 mean absolute deviations from a rolling centered median (excluding the observation under consideration) of the 50 observations around the one considered (so 25 before and 25 after). Here to be honest, I cannot figure out how to construct such a measure in sas.
To clarify, I need to compute a "rolling" median - let's call it M - that goes through the sample step by step and is constructed such that:
for given observations e.g. t1, t2,....,t25, tk ,tk+1,...,tk+25 , for observation tk the median is only computed of the values (t1-t25 and tk+1 to tk+25). And this has to run through all the observations in the sample. This is to ensure that unusual outliers, that are not in line with surrounding observations are removed, without removing any that might be e.g. the first after a discrete jump.
I hope you can help me with my issue. Thank you very much in advance!
Kind regards
Do something like this
data have;
infile "c:\Users\Peter\Downloads\example.csv" firstobs=2 dlm=",";
informat Time datetime23.3 seconds_time 10. RateBid 10.8 RateAsk 10.8;
format Time datetime23.3;
input Time RateBid RateAsk seconds_time days;
midquote=(RateBid+RateAsk)/2;
run;
proc expand data=have out=want;
id time;
convert midquote=rollingmedian / transformout=(cmovmed 50);
run;
Whenever you are to compute rolling statistics, think of PROC EXPAND 🙂
Do something like this
data have;
infile "c:\Users\Peter\Downloads\example.csv" firstobs=2 dlm=",";
informat Time datetime23.3 seconds_time 10. RateBid 10.8 RateAsk 10.8;
format Time datetime23.3;
input Time RateBid RateAsk seconds_time days;
midquote=(RateBid+RateAsk)/2;
run;
proc expand data=have out=want;
id time;
convert midquote=rollingmedian / transformout=(cmovmed 50);
run;
Whenever you are to compute rolling statistics, think of PROC EXPAND 🙂
Can you post your error message?
Yes, sure:
ERROR: Unable to allocate sufficient memory. At least 1329709K bytes were requested, but only
68786K were available. You must either increase the amount of memory available, or
approach the problem differently.
ERROR: The SAS System stopped processing this step because of insufficient memory.
however, there should not be an issue. I have 16GB RAM and in my orking directories there is more than enough disc space available.
any thoughts?
This is what i get when I run proc options group=memory:
Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=0 Specifies a limit on the amount of memory that is available for data
summarization procedures when class variables are active.
MAXMEMQUERY=0 Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0 Specifies a suggested amount of memory that is needed for executable programs
loaded by SAS.
MEMSIZE=2147483648
Specifies the limit on the amount of virtual memory that can be used during a
SAS session.
REALMEMSIZE=0 Specifies the amount of real memory SAS can expect to allocate
Please mark the post as accepted solution if it solved your problem 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.