Programming the statistical procedures from SAS

Outlier detection

Accepted Solution Solved
Reply
Regular Contributor
Posts: 162
Accepted Solution

Outlier detection

Good days to all,

I am doing corporate finance research, in which panel data (ie. longitudinal data) is collected. All the observations are firm-years.

I wish to detect the outliers before running any regressions. I read some of the articles from SAS, but there is no single method used to address the needs of panel data. 

I am writing to ask if it is possible to get some useful references (ie. books or articles or macro functions) on the outlier detection for panel data (ie. longitudinal data) using SAS.

Thank you in advance for any helps and suggestion.

Regards,

MSPAK


Accepted Solutions
Solution
‎05-23-2017 12:54 PM
Occasional Learner
Posts: 1

Re: Outlier detection

[ Edited ]

Hi, Mine is a belated reply as I started specializing on statistics especially with SAS only recently. I reached this blog to solve a similar problem as you posed, fortunately with the help given by different members I manged to solve mine. This may be of help to you, provided you have not solved it for the last many years Smiley Happy.

 

proc univariate data=" " robustscale plot;
var  varname;
run; 

 

 

This give you plethora of statistical values including robustscale and the outlier samples from among the given data as extreme values.  Here is a SAS-provided example of PROC UNIVARIATE and ROBUSTSCALE.

 

I initally calculated  q1, q3 and  iqr  to arrive at lower and upper bound values for outliers following "tukey" method. But, that didn't help me to filter out the outliers from the given data. On further exploration, I found proc univariate uses same "tukey" method to give lower and upper bound values in addition pinpointing the outliers. Very easy to follow.     

 

See also these useful notes from @Rick_SAS:



Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

4) Winsorization: the good, the bad, and the ugly


Editor's note: this is a popular topic.  This original reply has been edited to incorporate multiple useful responses.    

View solution in original post


All Replies
Super Contributor
Posts: 349

Re: Outlier detection

Hi,

How about using proc univariate.

http://www2.sas.com/proceedings/sugi24/Infovis/p161-24.pdf

Thanks,

Shiva

Regular Contributor
Posts: 162

Re: Outlier detection

Thanks Shiva,

I downloaded this article too.

I read  some academic articles but it is unknown on how these methods to be carried out by using SAS.

For example:

1. An iterative method (refer: http://www.ecob-consulting.com/assets/pdf/outliers.pdf)

2. High-Breakdown Estimators (refer to attached: American Statistical Association Portal :: Multivariate Outlier Detection With High-Breakdown Estim...)

Do anyone have the idea to work out these methods of outlier detection by using SAS?

Thank you for any suggestion, if any.

Regards,

mspak

Respected Advisor
Posts: 4,606

Re: Outlier detection

proc univariate and robustreg offer some high breakdown (i.e. insensitive to a certain amount of outlying observations) estimation.

PG

PG
SAS Super FREQ
Posts: 3,307

Re: Outlier detection

Grand Advisor
Posts: 9,452

Re: Outlier detection

I remembered Cody wrote a book ' Cleaning Data Skill' which describe some skill about removing outlier.

First of all , using proc rank to group the value ,and delete UPPER 10% and DOWN 10% data , and calculate

a range (mean-2*std , mean+2*std) , which range is what we need .

NOTICE: 2 is varying ,based on how many percent you need to remove .

Ksharp

Regular Contributor
Posts: 162

Re: Outlier detection

Hi Ksharp,

I have a macro for the winsorization method (see the following code):

%macro winsorize(dset, vars, oset);

data temp; set &dset; run;

proc sort data=temp; by fyear; run;

proc means data=temp p1 p99 noprint;

var &vars.;

by fyear;

output out=winvalue p1=p1value p99=p99value;

run;

data temp1; merge temp (in=x) winvalue; by fyear; if x; run;

data &oset.; set temp1;

if &vars. < p1value then &vars. = p1value;

if &vars. > p99value then &vars. = p99value;

run;

%mend winsorize;

I am thinking to use a specific method for panel data analysis.

I am still searching and I will share the code if I found any useful code.

Thank you.

mspak

Solution
‎05-23-2017 12:54 PM
Occasional Learner
Posts: 1

Re: Outlier detection

[ Edited ]

Hi, Mine is a belated reply as I started specializing on statistics especially with SAS only recently. I reached this blog to solve a similar problem as you posed, fortunately with the help given by different members I manged to solve mine. This may be of help to you, provided you have not solved it for the last many years Smiley Happy.

 

proc univariate data=" " robustscale plot;
var  varname;
run; 

 

 

This give you plethora of statistical values including robustscale and the outlier samples from among the given data as extreme values.  Here is a SAS-provided example of PROC UNIVARIATE and ROBUSTSCALE.

 

I initally calculated  q1, q3 and  iqr  to arrive at lower and upper bound values for outliers following "tukey" method. But, that didn't help me to filter out the outliers from the given data. On further exploration, I found proc univariate uses same "tukey" method to give lower and upper bound values in addition pinpointing the outliers. Very easy to follow.     

 

See also these useful notes from @Rick_SAS:



Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

4) Winsorization: the good, the bad, and the ugly


Editor's note: this is a popular topic.  This original reply has been edited to incorporate multiple useful responses.    

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 10629 views
  • 6 likes
  • 6 in conversation