Hi,
The variable age in the dataset is in years. There is one observation that is 40years while the next closest is 25 years. How do I write a code to remove the outlier from the variable age?
Is age supposed to be a sequence, or is it supposed to be from a certain distribution?
@gtucke1 wrote:
For the variable age, the question is, "How old is the patient in years?" I am keeping age as a continuous variable.
Why are you so sure that age is an "outlier"? Just how large is your data set? Something that occurs once in 50 records may not be an "outlier" but becomes more likely to be so at once in 50,000.
What do you want for a "removed outlier"? Remove the entire record? Remove the value (set to missing)? Replace the age with a different value?
One way:
This would create a data set that none of the values associated with that person would ever be used.
data want; set have; if age > (the limit you want to use) then delete; run;
Another. This would mean that the value of age would not be used but other values could be.
Data want; set have; if age > <limit> then age=.; run;
Which to choose depends on your needs/wants which have not been very clearly stated.
Adding to the suggestions by @ballardw here is a third method
Data want;
set have;
if age > <limit> then age=limit;
run;
And a fourth method:
Do nothing, it's not an outlier. An age of 40 seems to me to be not an outlier, but we don't know how you collected the data, and what population you collected the data from.
So, @gtucke1 to quote from @ballardw, "Which to choose depends on your needs/wants which have not been very clearly stated." YOU have to decide what to do about a given outlier, YOU have to state your needs. We can write the code, but we can't tell you what the right decision is.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.