Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- computing a rolling centereed median

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-25-2018 11:50 AM
(1158 views)

Hello sas community!

My issue is the following. I have a large dataset containing ultra high frequency data (tick data), which I want to filter for outliers as suggested in the literature:

Time RateBid RateAsk .....

01.01.2015:17:12:12.445 xxxxxxxxxx xxxxxxxxx

01.01.2015:17:13:32.565 xxxxxxxxxx xxxxxxxxx

01.01.2015:17:13:40.685 xxxxxxxxxx xxxxxxxxx

01.01.2015:17:14:59.895 1.32473 1.32487

01.01.2015:17:14:59.995 1.86743 1.97473

01.01.2015:17:13:32.565 xxxxxxxxxx xxxxxxxxx

01.01.2015:17:13:40.685 xxxxxxxxxx xxxxxxxxx

01.01.2015:17:14:59.895 1.32473 1.32487

01.01.2015:17:14:59.995 1.86743 1.97473

An example.csv is attached below. I have already removed many obvious data anomalies and now want to filter for outliers as suggested in the literature (e.g. Barndorff-Nielsen Hansen Lunde Shephard (2009) if any of you are interested).

My specific issue is:

I want to delete all entries for which the so called mid-quote ((RateBid+RateAsk)/2) deviated by more than 10 mean absolute deviations from a rolling centered median (excluding the observation under consideration) of the 50 observations around the one considered (so 25 before and 25 after). Here to be honest, I cannot figure out how to construct such a measure in sas.

To clarify, I need to compute a "rolling" median - let's call it M - that goes through the sample step by step and is constructed such that:

for given observations e.g. t1, t2,....,t25, tk ,tk+1,...,tk+25 , for observation tk the median is only computed of the values (t1-t25 and tk+1 to tk+25). And this has to run through all the observations in the sample. This is to ensure that unusual outliers, that are not in line with surrounding observations are removed, without removing any that might be e.g. the first after a discrete jump.

I hope you can help me with my issue. Thank you very much in advance!

Kind regards

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You're probably looking for something along these lines:

```
data ibm;
set sashelp.stocks;
where stock='IBM';
fake_date = _n_;
run;
proc expand data=ibm out=want;
id fake_date;
convert open = median_open / transformout=( cmovmed 51 trimleft 25);
run;
```

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do you have SAS/ETS? If so, PROC EXPAND s what you're looking for and look at the CONVERT example in the documentation. If you don't you have other options, but this is the first approach I'd take. You may have to construct a different TimeID since you're looking at the nearest 50 trades regardless of date/time difference.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I work with sas 9.4 as far as I remember, if that is what you are asking. Your idea looks at if it is what I am looking for. Regarding the TimeID you are right, my raw data time stamps are not equidistant. What kind of timeID would you suggest instead?

Thank you very much!

Thank you very much!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Run the following and check if the log includes ETS

`proc product_status;run;`

Mine shows:

For SAS/ETS ...

Custom version information: 14.1

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I get the same.

For SAS/ETS ...

Custom version information: 14.3

For SAS/ETS ...

Custom version information: 14.3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You're probably looking for something along these lines:

```
data ibm;
set sashelp.stocks;
where stock='IBM';
fake_date = _n_;
run;
proc expand data=ibm out=want;
id fake_date;
convert open = median_open / transformout=( cmovmed 51 trimleft 25);
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks! looks great!

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.