BookmarkSubscribeRSS Feed
ramkhatiwada
Calcite | Level 5

Hi,

Using proc uni-variate procedure, I found that the following Obs are outliers:

proc univariate;
var resids;
qqplot resids;
run;



  SAS Output

Extreme ObservationsLowest HighestValue Obs Value Obs
-8.409721885.12990691
-7.627632115.12990695
-7.468295705.12990810
-7.388513676.26658612
-6.794485887.25340

610

 

 

 

I want to remove these all 10 observations from data set. Is there any handy code for outlier removal? Thank you.

3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

There isn't a quick way, but you could save the output from the univariate, then use that to remove those values:

proc sql;
  delete from have
  where value_obs in (select value_obs from univariate_output);
quit;

That would remove all values which have the number given in the univariate output.

 

ballardw
Super User

@ramkhatiwada wrote:

Hi,

Using proc uni-variate procedure, I found that the following Obs are outliers:

proc univariate;
var resids;
qqplot resids;
run;



  SAS Output

Extreme ObservationsLowest HighestValue Obs Value Obs
-8.40972 188 5.12990 691
-7.62763 211 5.12990 695
-7.46829 570 5.12990 810
-7.38851 367 6.26658 612
-6.79448 588 7.25340

610

 

 

 

I want to remove these all 10 observations from data set. Is there any handy code for outlier removal? Thank you.


Are you sure that you want to remove observations? Removing an observation removes all other variables as well. Are other variables on those records still useful for other purposes? You might be better served by either adding a flag variable that indicates "do not use variable x when the flag value is 1 (or zero your choice)" by using where options. Or perhaps creating a new data set and setting these values to missing.

 

Also Proc Univariate always by default shows the five largest and smallest values. They are not automatically "outliers". You may very well have values such as -6.79200 remaining in your data. Is that an outlier?

 

Please run this example data and tell me if you actually think the five smallest and largest values are "outliers".

data work.dummy;
   do x=1 to 10;
   y=1;
   output;
   end;
run;

proc univariate data=work.dummy;
   var y;
run;
Reeza
Super User

Yeah, that's not a good rule for identifying outliers. 

Use a different logic.

sas-innovate-2024.png

 

Secure your spot at the must-attend AI and analytics event of 2024: SAS Innovate 2024! Get ready for a jam-packed agenda featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.

 

Register by March 1 to snag the Early Bird rate of just $695! Don't miss out on this exclusive offer. 

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 6871 views
  • 0 likes
  • 4 in conversation