BookmarkSubscribeRSS Feed
gtucke1
Fluorite | Level 6

Hi,

 

The variable age in the dataset is in years. There is one observation that is 40years while the next closest is 25 years. How do I write a code to remove the outlier from the variable age? 

6 REPLIES 6
PGStats
Opal | Level 21

Is age supposed to be a sequence, or is it supposed to be from a certain distribution?

PG
gtucke1
Fluorite | Level 6
For the variable age, the question is, "How old is the patient in years?" I am keeping age as a continuous variable.
ballardw
Super User

@gtucke1 wrote:
For the variable age, the question is, "How old is the patient in years?" I am keeping age as a continuous variable.

Why are you so sure that age is an "outlier"? Just how large is your data set? Something that occurs once in 50 records may not be an "outlier" but becomes more likely to be so at once in 50,000.

 

What do you want for a "removed outlier"? Remove the entire record? Remove the value (set to missing)? Replace the age with a different value?

gtucke1
Fluorite | Level 6
There are 596 observations. I don't know what to do with the outlier.
ballardw
Super User

One way:

This would create a data set that none of the values associated with that person would ever be used.

data want; 
   set have;
   if age > (the limit you want to use) then delete;
run;

Another. This would mean that the value of age would not be used but other values could be.

Data want; 
   set have;
   if age > <limit> then age=.;
run;

Which to choose depends on your needs/wants which have not been very clearly stated.

PaigeMiller
Diamond | Level 26

Adding to the suggestions by @ballardw here is a third method

 

Data want; 
   set have;
   if age > <limit> then age=limit;
run;

And a fourth method:

 

Do nothing, it's not an outlier. An age of 40 seems to me to be not an outlier, but we don't know how you collected the data, and what population you collected the data from.

 

So, @gtucke1 to quote from @ballardw, "Which to choose depends on your needs/wants which have not been very clearly stated." YOU have to decide what to do about a given outlier, YOU have to state your needs. We can write the code, but we can't tell you what the right decision is.

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 815 views
  • 0 likes
  • 4 in conversation