Help using Base SAS procedures

Removing outliers

Reply
New Contributor
Posts: 3

Removing outliers

Hi,

Using proc uni-variate procedure, I found that the following Obs are outliers:

proc univariate;
var resids;
qqplot resids;
run;



  SAS Output

Extreme ObservationsLowest HighestValue Obs Value Obs
-8.409721885.12990691
-7.627632115.12990695
-7.468295705.12990810
-7.388513676.26658612
-6.794485887.25340

610

 

 

 

I want to remove these all 10 observations from data set. Is there any handy code for outlier removal? Thank you.

Super User
Super User
Posts: 9,599

Re: Removing outliers

Posted in reply to ramkhatiwada

There isn't a quick way, but you could save the output from the univariate, then use that to remove those values:

proc sql;
  delete from have
  where value_obs in (select value_obs from univariate_output);
quit;

That would remove all values which have the number given in the univariate output.

 

Super User
Posts: 13,517

Re: Removing outliers

Posted in reply to ramkhatiwada

@ramkhatiwada wrote:

Hi,

Using proc uni-variate procedure, I found that the following Obs are outliers:

proc univariate;
var resids;
qqplot resids;
run;



  SAS Output

Extreme ObservationsLowest HighestValue Obs Value Obs
-8.40972 188 5.12990 691
-7.62763 211 5.12990 695
-7.46829 570 5.12990 810
-7.38851 367 6.26658 612
-6.79448 588 7.25340

610

 

 

 

I want to remove these all 10 observations from data set. Is there any handy code for outlier removal? Thank you.


Are you sure that you want to remove observations? Removing an observation removes all other variables as well. Are other variables on those records still useful for other purposes? You might be better served by either adding a flag variable that indicates "do not use variable x when the flag value is 1 (or zero your choice)" by using where options. Or perhaps creating a new data set and setting these values to missing.

 

Also Proc Univariate always by default shows the five largest and smallest values. They are not automatically "outliers". You may very well have values such as -6.79200 remaining in your data. Is that an outlier?

 

Please run this example data and tell me if you actually think the five smallest and largest values are "outliers".

data work.dummy;
   do x=1 to 10;
   y=1;
   output;
   end;
run;

proc univariate data=work.dummy;
   var y;
run;
Super User
Posts: 23,698

Re: Removing outliers

Posted in reply to ramkhatiwada

Yeah, that's not a good rule for identifying outliers. 

Use a different logic.

Ask a Question
Discussion stats
  • 3 replies
  • 107 views
  • 0 likes
  • 4 in conversation