Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Outlier detection

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-15-2012 12:26 AM
(46322 views)

Good days to all,

I am doing corporate finance research, in which panel data (ie. longitudinal data) is collected. All the observations are firm-years.

I wish to detect the outliers before running any regressions. I read some of the articles from SAS, but there is no single method used to address the needs of panel data.

I am writing to ask if it is possible to get some useful references (ie. books or articles or macro functions) on the outlier detection for panel data (ie. longitudinal data) using SAS.

Thank you in advance for any helps and suggestion.

Regards,

MSPAK

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi, Mine is a belated reply as I started specializing on statistics especially with SAS only recently. I reached this blog to solve a similar problem as you posed, fortunately with the help given by different members I manged to solve mine. This may be of help to you, provided you have not solved it for the last many years :).

```
proc univariate data=" " robustscale plot;
var varname;
run;
```

This give you plethora of statistical values including robustscale and the outlier samples from among the given data as extreme values. Here is a SAS-provided example of PROC UNIVARIATE and ROBUSTSCALE.

I initally calculated q1, q3 and iqr to arrive at lower and upper bound values for outliers following "tukey" method. But, that didn't help me to filter out the outliers from the given data. On further exploration, I found proc univariate uses same "tukey" method to give lower and upper bound values in addition pinpointing the outliers. Very easy to follow.

See also these useful notes from @Rick_SAS:

Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

*Editor's note: this is a popular topic. This original reply has been edited to incorporate multiple useful responses.*

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

How about using proc univariate.

http://www2.sas.com/proceedings/sugi24/Infovis/p161-24.pdf

Thanks,

Shiva

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Shiva,

I downloaded this article too.

I read some academic articles but it is unknown on how these methods to be carried out by using SAS.

For example:

1. An iterative method (refer: http://www.ecob-consulting.com/assets/pdf/outliers.pdf)

2. High-Breakdown Estimators (refer to attached: American Statistical Association Portal :: Multivariate Outlier Detection With High-Breakdown Estim...)

Do anyone have the idea to work out these methods of outlier detection by using SAS?

Thank you for any suggestion, if any.

Regards,

mspak

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**proc univariate** and **robustreg** offer some high breakdown (i.e. insensitive to a certain amount of outlying observations) estimation.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I remembered Cody wrote a book ' Cleaning Data Skill' which describe some skill about removing outlier.

First of all , using proc rank to group the value ,and delete UPPER 10% and DOWN 10% data , and calculate

a range (mean-2*std , mean+2*std) , which range is what we need .

NOTICE: 2 is varying ,based on how many percent you need to remove .

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Ksharp,

I have a macro for the winsorization method (see the following code):

%macro winsorize(dset, vars, oset);

data temp; set &dset; run;

proc sort data=temp; by fyear; run;

proc means data=temp p1 p99 noprint;

var &vars.;

by fyear;

output out=winvalue p1=p1value p99=p99value;

run;

data temp1; merge temp (in=x) winvalue; by fyear; if x; run;

data &oset.; set temp1;

if &vars. < p1value then &vars. = p1value;

if &vars. > p99value then &vars. = p99value;

run;

%mend winsorize;

I am thinking to use a specific method for panel data analysis.

I am still searching and I will share the code if I found any useful code.

Thank you.

mspak

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi, Mine is a belated reply as I started specializing on statistics especially with SAS only recently. I reached this blog to solve a similar problem as you posed, fortunately with the help given by different members I manged to solve mine. This may be of help to you, provided you have not solved it for the last many years :).

```
proc univariate data=" " robustscale plot;
var varname;
run;
```

This give you plethora of statistical values including robustscale and the outlier samples from among the given data as extreme values. Here is a SAS-provided example of PROC UNIVARIATE and ROBUSTSCALE.

I initally calculated q1, q3 and iqr to arrive at lower and upper bound values for outliers following "tukey" method. But, that didn't help me to filter out the outliers from the given data. On further exploration, I found proc univariate uses same "tukey" method to give lower and upper bound values in addition pinpointing the outliers. Very easy to follow.

See also these useful notes from @Rick_SAS:

Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

*Editor's note: this is a popular topic. This original reply has been edited to incorporate multiple useful responses.*

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.