Hello, I'm trying to output list of outliers for multiple numeric variables. Currently, I'm running the below code and merely replacing the vbox variable over and over again. Is there a macro to do this faster?
My goal is outlier with ids to be output for all numeric variables.
I tried %LET numvar = var1 var2 var3 var4 var5... etc;
then in vbox &numvar. This is not working.
ods output sgplot=boxplot_data;
proc sgplot data=data_a;
vbox var 1 / datalabel=stid;
run;
ods excel file='C:..........xlsx' ;
proc print data=boxplot_data;
run;
ods excel close;
data data_a;
input stid var1 var2 var3 var4 var5;
s1 0.1 1.9 2.7 4.6 3.9
s2 10.1 1.9 2.7 2.0 3.1
s3 0.1 1.2 2.4 3.9 6.9
s4 0.7 1.9 2.7 3.6 7.0
;
/**********************************************
*** purpose : do loop for each var
*** writen by : blueskyxyz
*** date: 2023-Jan-7
/**********************************************/
data data_a;
input stid $ var1 var2 var3 var4 var5;
datalines;
s1 0.1 1.9 2.7 4.6 3.9
s2 10.1 1.9 2.7 2.0 3.1
s3 0.1 1.2 2.4 3.9 6.9
s4 0.7 1.9 2.7 3.6 7.0
;
/*split with blank space*/
%LET numvar = var1 var2 var3 var4 var5;
%let cnt=%eval(%sysfunc(countc(&numvar," "))+1);
%put &cnt;
%macro loop_box();
%do i= 1 %to &cnt;
%let var=%scan(&numvar.,&i.);
%put &var.;
ods output sgplot=boxplot_data;
proc sgplot data=data_a;
vbox &var. / datalabel=stid;
run;
ods excel file="F:\Mysas\sas_code\&var..xlsx" ;
proc print data=boxplot_data;
run;
ods excel close;
%end;
%mend;
%loop_box();
It may help to define what you mean by "outlier".
An extension of your current approach would be reshaping the data, probably using Proc Transpose.
For your example: Note corrections to your data step so that it will run.
data data_a; input stid $ var1 var2 var3 var4 var5; datalines; s1 0.1 1.9 2.7 4.6 3.9 s2 10.1 1.9 2.7 2.0 3.1 s3 0.1 1.2 2.4 3.9 6.9 s4 0.7 1.9 2.7 3.6 7.0 ; proc transpose data=data_a out=trans; by stid; var var1-var5; run; proc sgplot data=trans; vbox col1 /category=_name_ datalabel=stid; run;
Depending on your data. You would have to sort the data by Stid if not already done so prior to the transpose.
You could place any numeric variables on the Var statement in the Proc Transpose code. Too many may make the data set hard to display in a single graph panel. So may want to use more than one transpose with different variables.
If you have duplicate values of your BY variable you may get more than one output variable for some variables.
This is using defaults for the transpose.
I don't want to transpose, there are too many variables. And proc sgplot included outliers for the one variable specified. (I assume they are using the standard definition: Outliers = Observations > Q3 + 1.5*IQR or < Q1 – 1.5*IQR)
My problem is that I want to find the outliers for more than one variable at once, not just doing the proc sgplots over and over for each variable.
So can I run proc sgplot to spit out multiple box plots/summaries in the vbox line with a macro of some sort?
@eawhit10 wrote:
I don't want to transpose, there are too many variables. And proc sgplot included outliers for the one variable specified. (I assume they are using the standard definition: Outliers = Observations > Q3 + 1.5*IQR or < Q1 – 1.5*IQR)
My problem is that I want to find the outliers for more than one variable at once, not just doing the proc sgplots over and over for each variable.
So can I run proc sgplot to spit out multiple box plots/summaries in the vbox line with a macro of some sort?
I have given code that produces Q3, Q1 and IQR for all numeric variables at once. Macros are not needed.
Use PROC MEANS which gives you output on many variables, all at once.
proc means noprint data=data_a stackods n min max mean q1 median q3 p95 p5;
ods output summary=summary;
var var1-var5;
run;
Agreeing with @ballardw — you need a definition of outlier. SAS will compute whatever statistics you want, but determining what is and is not an outlier involves additional effort beyond computing statistics.
Or use PROC UNIVARIATE which gives you even a wider range of possible outputs.
proc univariate data=data_a;
var var1-var5;
output out=univ_out mean=m1-m5 std=std1-std5
p5=p5_1-p5_5 p95=p95_1-p95_5;
run;
/**********************************************
*** purpose : do loop for each var
*** writen by : blueskyxyz
*** date: 2023-Jan-7
/**********************************************/
data data_a;
input stid $ var1 var2 var3 var4 var5;
datalines;
s1 0.1 1.9 2.7 4.6 3.9
s2 10.1 1.9 2.7 2.0 3.1
s3 0.1 1.2 2.4 3.9 6.9
s4 0.7 1.9 2.7 3.6 7.0
;
/*split with blank space*/
%LET numvar = var1 var2 var3 var4 var5;
%let cnt=%eval(%sysfunc(countc(&numvar," "))+1);
%put &cnt;
%macro loop_box();
%do i= 1 %to &cnt;
%let var=%scan(&numvar.,&i.);
%put &var.;
ods output sgplot=boxplot_data;
proc sgplot data=data_a;
vbox &var. / datalabel=stid;
run;
ods excel file="F:\Mysas\sas_code\&var..xlsx" ;
proc print data=boxplot_data;
run;
ods excel close;
%end;
%mend;
%loop_box();
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.