Hello sas community!
My issue is the following. I have a large dataset containing ultra high frequency data (tick data), which I want to filter for outliers as suggested in the literature:
An example.csv is attached below. I have already removed many obvious data anomalies and now want to filter for outliers as suggested in the literature (e.g. Barndorff-Nielsen Hansen Lunde Shephard (2009) if any of you are interested).
My specific issue is:
I want to delete all entries for which the so called mid-quote ((RateBid+RateAsk)/2) deviated by more than 10 mean absolute deviations from a rolling centered median (excluding the observation under consideration) of the 50 observations around the one considered (so 25 before and 25 after). Here to be honest, I cannot figure out how to construct such a measure in sas.
To clarify, I need to compute a "rolling" median - let's call it M - that goes through the sample step by step and is constructed such that:
for given observations e.g. t1, t2,....,t25, tk ,tk+1,...,tk+25 , for observation tk the median is only computed of the values (t1-t25 and tk+1 to tk+25). And this has to run through all the observations in the sample. This is to ensure that unusual outliers, that are not in line with surrounding observations are removed, without removing any that might be e.g. the first after a discrete jump.
I hope you can help me with my issue. Thank you very much in advance!
Kind regards
You're probably looking for something along these lines:
data ibm;
set sashelp.stocks;
where stock='IBM';
fake_date = _n_;
run;
proc expand data=ibm out=want;
id fake_date;
convert open = median_open / transformout=( cmovmed 51 trimleft 25);
run;
Run the following and check if the log includes ETS
proc product_status;run;
Mine shows:
For SAS/ETS ...
Custom version information: 14.1
You're probably looking for something along these lines:
data ibm;
set sashelp.stocks;
where stock='IBM';
fake_date = _n_;
run;
proc expand data=ibm out=want;
id fake_date;
convert open = median_open / transformout=( cmovmed 51 trimleft 25);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.