BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fatemeh
Quartz | Level 8

Hello all,

Any help will be appreciated, below is the macro function sas code to detect outliers :

  1.  I do not understand these sections: " Q1 = &Q1 &val._P25;"," varL = &varL &val.L;", what sections in sas documentation explain theses parts ?
  2.  How can i detect outliers using this macro function?
  3. How can i change the lower and upper limit to 10percentile and 90percentile?
  4. How can i get only outliers that are larger than 90percentile?
%macro outliers(input=, var=, output= );

%let Q1=;
%let Q3=;
%let varL=;
%let varH=;

%let n=%sysfunc(countw(&var));
%do i= 1 %to &n;
%let val = %scan(&var,&i);
%let Q1 = &Q1 &val._P25;
%let Q3 = &Q3 &val._P75;
%let varL = &varL &val.L;
%let varH = &varH &val.H;
%end;

/* Calculate the quartiles and inter-quartile range using proc univariate */
proc means data=&input nway noprint;
var &var;
output out=temp P25= P75= / autoname;
run;

/* Extract the upper and lower limits into macro variables */
data temp;
set temp;
ID = 1;
array varb(&n) &Q1;
array varc(&n) &Q3;
array lower(&n) &varL;
array upper(&n) &varH;
do i = 1 to dim(varb);
lower(i) = varb(i) - 3 * (varc(i) - varb(i));
upper(i) = varc(i) + 3 * (varc(i) - varb(i));
end;
drop i _type_ _freq_;
run;

data temp1;
set &input;
ID = 1;
run;
data &output;
merge temp1 temp;
by ID;
array var(&n) &var;
array lower(&n) &varL;
array upper(&n) &varH;
do i = 1 to dim(var);
if not missing(var(i)) then do;
if var(i) >= lower(i) and var(i) <= upper(i);
end;
end;
drop &Q1 &Q3 &varL &varH ID i;
run;
%mend;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@fatemeh wrote:

Hello all,

Any help will be appreciated, below is the macro function sas code to detect outliers :

  1.  I do not understand these sections: " Q1 = &Q1 &val._P25;"," varL = &varL &val.L;", what sections in sas documentation explain theses parts ?

 


These are macro resolution and are literal text replacements, documented here. 

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=mcrolref&docsetTarget=n07...

 

If you're not familiar with macros and macro variables, that should be your starting point. 

 

UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

 

 


@fatemeh wrote:
  1.  How can i detect outliers using this macro function?

This code removes outliers, it doesn't detect them per se. This line is known as a SUBSETTING IF and will only keep values within the interval. If you want to detect outliers, you could consider reversing the condition to only keep outliers or to flag those records someway instead. 

 

if var(i) >= lower(i) and var(i) <= upper(i);

The upper and lower bounds are calculated using this formula, not one I'm familiar with. 

lower(i) = varb(i) - 3 * (varc(i) - varb(i));
upper(i) = varc(i) + 3 * (varc(i) - varb(i));

@fatemeh wrote:
  1.  
  2. How can i change the lower and upper limit to 10percentile and 90percentile?

 


You can change the statistics calculated in the PROC MEANS statement and then need to change all the references going down the code to do that instead of using the rule in this code. If that were the case I would instead suggest you get your own NON MACRO code working and then work on converting your working code to a macro instead. This isn't a particularly efficient macro anyways. 

FYI - PROC RANK will easily group data into deciles and anything in the first and last groups would be your outliers for example. 

 

proc rank data=sashelp.cars out=cars_ranked groups =10;
var mpg_city;
rank rank_mpg_city;
run;

*first and last groups would be outliers;
proc freq data=cars_ranked;
table rank_mpg_city;
run;

@fatemeh wrote:
  1.  
  2. How can i get only outliers that are larger than 90percentile

I suggest using PROC RANK instead, it's much easier in your use case, this would isolate all the observations in the top decile. 

 

data cars_ranked_outliers;
set cars_ranked;
where rank_mpg_city = 9;
run;

Once you have your new code working here's a tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md


View solution in original post

2 REPLIES 2
Reeza
Super User

@fatemeh wrote:

Hello all,

Any help will be appreciated, below is the macro function sas code to detect outliers :

  1.  I do not understand these sections: " Q1 = &Q1 &val._P25;"," varL = &varL &val.L;", what sections in sas documentation explain theses parts ?

 


These are macro resolution and are literal text replacements, documented here. 

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=mcrolref&docsetTarget=n07...

 

If you're not familiar with macros and macro variables, that should be your starting point. 

 

UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

 

 


@fatemeh wrote:
  1.  How can i detect outliers using this macro function?

This code removes outliers, it doesn't detect them per se. This line is known as a SUBSETTING IF and will only keep values within the interval. If you want to detect outliers, you could consider reversing the condition to only keep outliers or to flag those records someway instead. 

 

if var(i) >= lower(i) and var(i) <= upper(i);

The upper and lower bounds are calculated using this formula, not one I'm familiar with. 

lower(i) = varb(i) - 3 * (varc(i) - varb(i));
upper(i) = varc(i) + 3 * (varc(i) - varb(i));

@fatemeh wrote:
  1.  
  2. How can i change the lower and upper limit to 10percentile and 90percentile?

 


You can change the statistics calculated in the PROC MEANS statement and then need to change all the references going down the code to do that instead of using the rule in this code. If that were the case I would instead suggest you get your own NON MACRO code working and then work on converting your working code to a macro instead. This isn't a particularly efficient macro anyways. 

FYI - PROC RANK will easily group data into deciles and anything in the first and last groups would be your outliers for example. 

 

proc rank data=sashelp.cars out=cars_ranked groups =10;
var mpg_city;
rank rank_mpg_city;
run;

*first and last groups would be outliers;
proc freq data=cars_ranked;
table rank_mpg_city;
run;

@fatemeh wrote:
  1.  
  2. How can i get only outliers that are larger than 90percentile

I suggest using PROC RANK instead, it's much easier in your use case, this would isolate all the observations in the top decile. 

 

data cars_ranked_outliers;
set cars_ranked;
where rank_mpg_city = 9;
run;

Once you have your new code working here's a tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md


gcjfernandez
SAS Employee

If you have specific question about this Outlier macro please check with the author, Deepanshu Bhalla at https://www.listendata.com/2014/10/identify-and-remove-outliers-with-sas.html

and clarify your specific doubts.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 862 views
  • 5 likes
  • 3 in conversation