BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas_search
Calcite | Level 5

Hi team 

I am working on detection of outliers from the payroll data set where I have to detect the outliers on weekly basis. Kindly help me out in writing algorithm in SAS to automate the same. 

For reference below is one of the example of the data in which I have to find outliers.

 

Calls 425 555 235 452 654 785 452 555 665 245 324 245 365 452 754 854 546 456 652 345 246 356 124 758 111 954 845 

 

Thank you 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Just to get things started, you can do something like this

 

data have;
input Calls @@;
datalines;
425 555 235 452 654 785 452 555 665
245 324 245 365 452 754 854 546 456
652 345 246 356 124 758 111 954 845
;

proc univariate data=have;
    var Calls;
    output out=stats qrange=iqr mean=mean;
run; 

data Outliers;
    if _n_=1 then set stats;
    set have;
    if calls lt mean-1.5*iqr | calls gt mean+1.5*iqr;
run;

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

Hi. There are several ways to do outlier detection in SAS. Do a Google search and you will find tons of examples.. However, it all depends on what you consider and outlier?

Ksharp
Super User

How do you define outliers ? out of range [mean-3*std , mean+3*std ] or [mean-2*std , mean+2*std ]  ......?

ballardw
Super User

@Ksharp wrote:

How do you define outliers ? out of range [mean-3*std , mean+3*std ] or [mean-2*std , mean+2*std ]  ......?


Amount of change from previous value? Above set value? Below set value?

Ksharp
Super User

Above set value  OR  Below set value .

sas_search
Calcite | Level 5

Outlier is an  out of range where If a value is higher and lesser than the 3 times of Interquartile Range (IQR)  

PeterClemmensen
Tourmaline | Level 20

Do you mean less than or greater than the mean +- 3*IQR? 

PeterClemmensen
Tourmaline | Level 20

Just to get things started, you can do something like this

 

data have;
input Calls @@;
datalines;
425 555 235 452 654 785 452 555 665
245 324 245 365 452 754 854 546 456
652 345 246 356 124 758 111 954 845
;

proc univariate data=have;
    var Calls;
    output out=stats qrange=iqr mean=mean;
run; 

data Outliers;
    if _n_=1 then set stats;
    set have;
    if calls lt mean-1.5*iqr | calls gt mean+1.5*iqr;
run;
sas_search
Calcite | Level 5

Hi

Thank you for the solution I got it. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1284 views
  • 4 likes
  • 4 in conversation