Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- detect outliers using macro function

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-07-2021 03:19 PM
(947 views)

Hello all,

Any help will be appreciated, below is the macro function sas code to detect outliers :` `

- I do not understand these sections: "
`Q1 = &Q1 &val._P25;","`

what sections in sas documentation explain theses parts`varL = &varL &val.L;`

",`?`

- How can i detect outliers using this macro function?
- How can i change the lower and upper limit to 10percentile and 90percentile?
- How can i get only outliers that are larger than 90percentile?

```
%macro outliers(input=, var=, output= );
%let Q1=;
%let Q3=;
%let varL=;
%let varH=;
%let n=%sysfunc(countw(&var));
%do i= 1 %to &n;
%let val = %scan(&var,&i);
%let Q1 = &Q1 &val._P25;
%let Q3 = &Q3 &val._P75;
%let varL = &varL &val.L;
%let varH = &varH &val.H;
%end;
/* Calculate the quartiles and inter-quartile range using proc univariate */
proc means data=&input nway noprint;
var &var;
output out=temp P25= P75= / autoname;
run;
/* Extract the upper and lower limits into macro variables */
data temp;
set temp;
ID = 1;
array varb(&n) &Q1;
array varc(&n) &Q3;
array lower(&n) &varL;
array upper(&n) &varH;
do i = 1 to dim(varb);
lower(i) = varb(i) - 3 * (varc(i) - varb(i));
upper(i) = varc(i) + 3 * (varc(i) - varb(i));
end;
drop i _type_ _freq_;
run;
data temp1;
set &input;
ID = 1;
run;
data &output;
merge temp1 temp;
by ID;
array var(&n) &var;
array lower(&n) &varL;
array upper(&n) &varH;
do i = 1 to dim(var);
if not missing(var(i)) then do;
if var(i) >= lower(i) and var(i) <= upper(i);
end;
end;
drop &Q1 &Q3 &varL &varH ID i;
run;
%mend;
```

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@fatemeh wrote:

Hello all,

Any help will be appreciated, below is the macro function sas code to detect outliers :

- I do not understand these sections: "
`Q1 = &Q1 &val._P25;","`

what sections in sas documentation explain theses parts`varL = &varL &val.L;`

",`?`

These are macro resolution and are literal text replacements, documented here.

If you're not familiar with macros and macro variables, that should be your starting point.

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

@fatemeh wrote:

`How can i detect outliers using this macro function?`

This code removes outliers, it doesn't detect them per se. This line is known as a SUBSETTING IF and will only keep values within the interval. If you want to detect outliers, you could consider reversing the condition to only keep outliers or to flag those records someway instead.

`if var(i) >= lower(i) and var(i) <= upper(i);`

The upper and lower bounds are calculated using this formula, not one I'm familiar with.

```
lower(i) = varb(i) - 3 * (varc(i) - varb(i));
upper(i) = varc(i) + 3 * (varc(i) - varb(i));
```

@fatemeh wrote:

- How can i change the lower and upper limit to 10percentile and 90percentile?

You can change the statistics calculated in the PROC MEANS statement and then need to change all the references going down the code to do that instead of using the rule in this code. If that were the case I would instead suggest you get your own NON MACRO code working and then work on converting your working code to a macro instead. This isn't a particularly efficient macro anyways.

FYI - PROC RANK will easily group data into deciles and anything in the first and last groups would be your outliers for example.

```
proc rank data=sashelp.cars out=cars_ranked groups =10;
var mpg_city;
rank rank_mpg_city;
run;
*first and last groups would be outliers;
proc freq data=cars_ranked;
table rank_mpg_city;
run;
```

@fatemeh wrote:

- How can i get only outliers that are larger than 90percentile

I suggest using PROC RANK instead, it's much easier in your use case, this would isolate all the observations in the top decile.

```
data cars_ranked_outliers;
set cars_ranked;
where rank_mpg_city = 9;
run;
```

Once you have your new code working here's a tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

2 REPLIES 2

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@fatemeh wrote:

Hello all,

Any help will be appreciated, below is the macro function sas code to detect outliers :

- I do not understand these sections: "
`Q1 = &Q1 &val._P25;","`

what sections in sas documentation explain theses parts`varL = &varL &val.L;`

",`?`

These are macro resolution and are literal text replacements, documented here.

If you're not familiar with macros and macro variables, that should be your starting point.

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

@fatemeh wrote:

`How can i detect outliers using this macro function?`

This code removes outliers, it doesn't detect them per se. This line is known as a SUBSETTING IF and will only keep values within the interval. If you want to detect outliers, you could consider reversing the condition to only keep outliers or to flag those records someway instead.

`if var(i) >= lower(i) and var(i) <= upper(i);`

The upper and lower bounds are calculated using this formula, not one I'm familiar with.

```
lower(i) = varb(i) - 3 * (varc(i) - varb(i));
upper(i) = varc(i) + 3 * (varc(i) - varb(i));
```

@fatemeh wrote:

- How can i change the lower and upper limit to 10percentile and 90percentile?

You can change the statistics calculated in the PROC MEANS statement and then need to change all the references going down the code to do that instead of using the rule in this code. If that were the case I would instead suggest you get your own NON MACRO code working and then work on converting your working code to a macro instead. This isn't a particularly efficient macro anyways.

FYI - PROC RANK will easily group data into deciles and anything in the first and last groups would be your outliers for example.

```
proc rank data=sashelp.cars out=cars_ranked groups =10;
var mpg_city;
rank rank_mpg_city;
run;
*first and last groups would be outliers;
proc freq data=cars_ranked;
table rank_mpg_city;
run;
```

@fatemeh wrote:

- How can i get only outliers that are larger than 90percentile

I suggest using PROC RANK instead, it's much easier in your use case, this would isolate all the observations in the top decile.

```
data cars_ranked_outliers;
set cars_ranked;
where rank_mpg_city = 9;
run;
```

Once you have your new code working here's a tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have specific question about this Outlier macro please check with the author, Deepanshu Bhalla at https://www.listendata.com/2014/10/identify-and-remove-outliers-with-sas.html

and clarify your specific doubts.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.