turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Outlier detection

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-15-2012 12:26 AM

Good days to all,

I am doing corporate finance research, in which panel data (ie. longitudinal data) is collected. All the observations are firm-years.

I wish to detect the outliers before running any regressions. I read some of the articles from SAS, but there is no single method used to address the needs of panel data.

I am writing to ask if it is possible to get some useful references (ie. books or articles or macro functions) on the outlier detection for panel data (ie. longitudinal data) using SAS.

Thank you in advance for any helps and suggestion.

Regards,

MSPAK

Accepted Solutions

Solution

05-23-2017
12:54 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mspak

04-13-2017 07:04 AM - last edited on 05-23-2017 12:53 PM by ChrisHemedinger

Hi, Mine is a belated reply as I started specializing on statistics especially with SAS only recently. I reached this blog to solve a similar problem as you posed, fortunately with the help given by different members I manged to solve mine. This may be of help to you, provided you have not solved it for the last many years .

```
proc univariate data=" " robustscale plot;
var varname;
run;
```

This give you plethora of statistical values including robustscale and the outlier samples from among the given data as extreme values. Here is a SAS-provided example of PROC UNIVARIATE and ROBUSTSCALE.

I initally calculated q1, q3 and iqr to arrive at lower and upper bound values for outliers following "tukey" method. But, that didn't help me to filter out the outliers from the given data. On further exploration, I found proc univariate uses same "tukey" method to give lower and upper bound values in addition pinpointing the outliers. Very easy to follow.

See also these useful notes from @Rick_SAS:

Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

*Editor's note: this is a popular topic. This original reply has been edited to incorporate multiple useful responses.*

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mspak

04-15-2012 12:48 AM

Hi,

How about using proc univariate.

http://www2.sas.com/proceedings/sugi24/Infovis/p161-24.pdf

Thanks,

Shiva

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to shivas

04-15-2012 01:09 AM

Thanks Shiva,

I downloaded this article too.

I read some academic articles but it is unknown on how these methods to be carried out by using SAS.

For example:

1. An iterative method (refer: http://www.ecob-consulting.com/assets/pdf/outliers.pdf)

2. High-Breakdown Estimators (refer to attached: American Statistical Association Portal :: Multivariate Outlier Detection With High-Breakdown Estim...)

Do anyone have the idea to work out these methods of outlier detection by using SAS?

Thank you for any suggestion, if any.

Regards,

mspak

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mspak

04-15-2012 12:40 PM

**proc univariate** and **robustreg** offer some high breakdown (i.e. insensitive to a certain amount of outlying observations) estimation.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mspak

04-16-2012 09:00 AM

Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mspak

04-17-2012 04:22 AM

I remembered Cody wrote a book ' Cleaning Data Skill' which describe some skill about removing outlier.

First of all , using proc rank to group the value ,and delete UPPER 10% and DOWN 10% data , and calculate

a range (mean-2*std , mean+2*std) , which range is what we need .

NOTICE: 2 is varying ,based on how many percent you need to remove .

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

04-17-2012 09:00 AM

Hi Ksharp,

I have a macro for the winsorization method (see the following code):

%macro winsorize(dset, vars, oset);

data temp; set &dset; run;

proc sort data=temp; by fyear; run;

proc means data=temp p1 p99 noprint;

var &vars.;

by fyear;

output out=winvalue p1=p1value p99=p99value;

run;

data temp1; merge temp (in=x) winvalue; by fyear; if x; run;

data &oset.; set temp1;

if &vars. < p1value then &vars. = p1value;

if &vars. > p99value then &vars. = p99value;

run;

%mend winsorize;

I am thinking to use a specific method for panel data analysis.

I am still searching and I will share the code if I found any useful code.

Thank you.

mspak

Solution

05-23-2017
12:54 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mspak

04-13-2017 07:04 AM - last edited on 05-23-2017 12:53 PM by ChrisHemedinger

Hi, Mine is a belated reply as I started specializing on statistics especially with SAS only recently. I reached this blog to solve a similar problem as you posed, fortunately with the help given by different members I manged to solve mine. This may be of help to you, provided you have not solved it for the last many years .

```
proc univariate data=" " robustscale plot;
var varname;
run;
```

This give you plethora of statistical values including robustscale and the outlier samples from among the given data as extreme values. Here is a SAS-provided example of PROC UNIVARIATE and ROBUSTSCALE.

I initally calculated q1, q3 and iqr to arrive at lower and upper bound values for outliers following "tukey" method. But, that didn't help me to filter out the outliers from the given data. On further exploration, I found proc univariate uses same "tukey" method to give lower and upper bound values in addition pinpointing the outliers. Very easy to follow.

See also these useful notes from @Rick_SAS:

Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

*Editor's note: this is a popular topic. This original reply has been edited to incorporate multiple useful responses.*