BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Good days to all,

I am doing corporate finance research, in which panel data (ie. longitudinal data) is collected. All the observations are firm-years.

I wish to detect the outliers before running any regressions. I read some of the articles from SAS, but there is no single method used to address the needs of panel data. 

I am writing to ask if it is possible to get some useful references (ie. books or articles or macro functions) on the outlier detection for panel data (ie. longitudinal data) using SAS.

Thank you in advance for any helps and suggestion.

Regards,

MSPAK

1 ACCEPTED SOLUTION

Accepted Solutions
Subbu70
Calcite | Level 5

Hi, Mine is a belated reply as I started specializing on statistics especially with SAS only recently. I reached this blog to solve a similar problem as you posed, fortunately with the help given by different members I manged to solve mine. This may be of help to you, provided you have not solved it for the last many years :).

 

proc univariate data=" " robustscale plot;
var  varname;
run; 

 

 

This give you plethora of statistical values including robustscale and the outlier samples from among the given data as extreme values.  Here is a SAS-provided example of PROC UNIVARIATE and ROBUSTSCALE.

 

I initally calculated  q1, q3 and  iqr  to arrive at lower and upper bound values for outliers following "tukey" method. But, that didn't help me to filter out the outliers from the given data. On further exploration, I found proc univariate uses same "tukey" method to give lower and upper bound values in addition pinpointing the outliers. Very easy to follow.     

 

See also these useful notes from @Rick_SAS:



Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

4) Winsorization: the good, the bad, and the ugly


Editor's note: this is a popular topic.  This original reply has been edited to incorporate multiple useful responses.    

View solution in original post

7 REPLIES 7
shivas
Pyrite | Level 9

Hi,

How about using proc univariate.

http://www2.sas.com/proceedings/sugi24/Infovis/p161-24.pdf

Thanks,

Shiva

mspak
Quartz | Level 8

Thanks Shiva,

I downloaded this article too.

I read  some academic articles but it is unknown on how these methods to be carried out by using SAS.

For example:

1. An iterative method (refer: http://www.ecob-consulting.com/assets/pdf/outliers.pdf)

2. High-Breakdown Estimators (refer to attached: American Statistical Association Portal :: Multivariate Outlier Detection With High-Breakdown Estim...)

Do anyone have the idea to work out these methods of outlier detection by using SAS?

Thank you for any suggestion, if any.

Regards,

mspak

PGStats
Opal | Level 21

proc univariate and robustreg offer some high breakdown (i.e. insensitive to a certain amount of outlying observations) estimation.

PG

PG
Ksharp
Super User

I remembered Cody wrote a book ' Cleaning Data Skill' which describe some skill about removing outlier.

First of all , using proc rank to group the value ,and delete UPPER 10% and DOWN 10% data , and calculate

a range (mean-2*std , mean+2*std) , which range is what we need .

NOTICE: 2 is varying ,based on how many percent you need to remove .

Ksharp

mspak
Quartz | Level 8

Hi Ksharp,

I have a macro for the winsorization method (see the following code):

%macro winsorize(dset, vars, oset);

data temp; set &dset; run;

proc sort data=temp; by fyear; run;

proc means data=temp p1 p99 noprint;

var &vars.;

by fyear;

output out=winvalue p1=p1value p99=p99value;

run;

data temp1; merge temp (in=x) winvalue; by fyear; if x; run;

data &oset.; set temp1;

if &vars. < p1value then &vars. = p1value;

if &vars. > p99value then &vars. = p99value;

run;

%mend winsorize;

I am thinking to use a specific method for panel data analysis.

I am still searching and I will share the code if I found any useful code.

Thank you.

mspak

Subbu70
Calcite | Level 5

Hi, Mine is a belated reply as I started specializing on statistics especially with SAS only recently. I reached this blog to solve a similar problem as you posed, fortunately with the help given by different members I manged to solve mine. This may be of help to you, provided you have not solved it for the last many years :).

 

proc univariate data=" " robustscale plot;
var  varname;
run; 

 

 

This give you plethora of statistical values including robustscale and the outlier samples from among the given data as extreme values.  Here is a SAS-provided example of PROC UNIVARIATE and ROBUSTSCALE.

 

I initally calculated  q1, q3 and  iqr  to arrive at lower and upper bound values for outliers following "tukey" method. But, that didn't help me to filter out the outliers from the given data. On further exploration, I found proc univariate uses same "tukey" method to give lower and upper bound values in addition pinpointing the outliers. Very easy to follow.     

 

See also these useful notes from @Rick_SAS:



Yes, you can do MCD estimation with SAS. Here are a few articles:

1) Detecting Outliers in SAS: Part 3

2) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician" (beginning on p. 7)

3) You might also enjoy reading "The curse of dimensionality: How to define outliers in high--dimensional data?"

4) Winsorization: the good, the bad, and the ugly


Editor's note: this is a popular topic.  This original reply has been edited to incorporate multiple useful responses.    

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 46106 views
  • 9 likes
  • 6 in conversation