BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
eawhit10
Fluorite | Level 6

Hello, I'm trying to output list of outliers for multiple numeric variables. Currently, I'm running the below code and merely replacing the vbox variable over and over again. Is there a macro to do this faster? 

My goal is outlier with ids to be output for all numeric variables.

 

I tried %LET numvar = var1 var2 var3 var4 var5... etc;

then in vbox &numvar. This is not working.

 

 

ods output sgplot=boxplot_data;
proc sgplot data=data_a;
vbox var 1 / datalabel=stid;
run;

ods excel file='C:..........xlsx'  ;
proc print data=boxplot_data;
run;
ods excel close;

 

data data_a;
input stid var1 var2 var3 var4 var5;
s1 0.1 1.9 2.7 4.6 3.9
s2 10.1 1.9 2.7 2.0 3.1
s3 0.1 1.2 2.4 3.9 6.9
s4 0.7 1.9 2.7 3.6 7.0
;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
blueskyxyz
Lapis Lazuli | Level 10
/**********************************************
*** purpose : do loop for each var 
*** writen by : blueskyxyz
*** date: 2023-Jan-7
/**********************************************/
data data_a;
input stid $ var1 var2 var3 var4 var5;
datalines;
s1 0.1 1.9 2.7 4.6 3.9
s2 10.1 1.9 2.7 2.0 3.1
s3 0.1 1.2 2.4 3.9 6.9
s4 0.7 1.9 2.7 3.6 7.0
;

/*split with blank space*/
%LET numvar = var1 var2 var3 var4 var5;
%let cnt=%eval(%sysfunc(countc(&numvar," "))+1);
%put &cnt;

%macro loop_box();
%do i= 1 %to &cnt;
	%let var=%scan(&numvar.,&i.);
	%put &var.;

	ods output sgplot=boxplot_data;
	proc sgplot data=data_a;
		vbox &var. / datalabel=stid;
	run;

	ods excel file="F:\Mysas\sas_code\&var..xlsx"  ;
	proc print data=boxplot_data;
	run;
	ods excel close;
%end;
%mend;

%loop_box();

View solution in original post

7 REPLIES 7
ballardw
Super User

It may help to define what you mean by "outlier".

 

An extension of your current approach would be reshaping the data, probably using Proc Transpose.

For your example: Note corrections to your data step so that it will run.

data data_a;
input stid $ var1 var2 var3 var4 var5;
datalines;
s1 0.1 1.9 2.7 4.6 3.9
s2 10.1 1.9 2.7 2.0 3.1
s3 0.1 1.2 2.4 3.9 6.9
s4 0.7 1.9 2.7 3.6 7.0
;

proc transpose data=data_a out=trans;
   by stid;
   var var1-var5;
run;

proc sgplot data=trans;
   vbox col1 /category=_name_ datalabel=stid;
run;

Depending on your data. You would have to sort the data by Stid if not already done so prior to the transpose.

You could place any numeric variables on the Var statement in the Proc Transpose code. Too many may make the data set hard to display in a single graph panel. So may want to use more than one transpose with different variables.

If you have duplicate values of your BY variable you may get more than one output variable for some variables.

This is using defaults for the transpose.

 

eawhit10
Fluorite | Level 6

I don't want to transpose, there are too many variables. And proc sgplot included outliers for the one variable specified. (I assume they are using the standard definition: Outliers = Observations > Q3 + 1.5*IQR  or < Q1 – 1.5*IQR)

 

My problem is that I want to find the outliers for more than one variable at once, not just doing the proc sgplots over and over for each variable. 

So can I run proc sgplot to spit out multiple box plots/summaries in the vbox line with a macro of some sort?

PaigeMiller
Diamond | Level 26

@eawhit10 wrote:

I don't want to transpose, there are too many variables. And proc sgplot included outliers for the one variable specified. (I assume they are using the standard definition: Outliers = Observations > Q3 + 1.5*IQR  or < Q1 – 1.5*IQR)

 

My problem is that I want to find the outliers for more than one variable at once, not just doing the proc sgplots over and over for each variable. 

So can I run proc sgplot to spit out multiple box plots/summaries in the vbox line with a macro of some sort?


I have given code that produces Q3, Q1 and IQR for all numeric variables at once. Macros are not needed.

 

 

 

 

--
Paige Miller
PaigeMiller
Diamond | Level 26

Use PROC MEANS which gives you output on many variables, all at once.

 

proc means noprint data=data_a stackods n min max mean q1 median q3 p95 p5;
ods output summary=summary;
var var1-var5;
run;

Agreeing with @ballardw — you need a definition of outlier. SAS will compute whatever statistics you want, but determining what is and is not an outlier involves additional effort beyond computing statistics.

 

Or use PROC UNIVARIATE which gives you even a wider range of possible outputs.

 

proc univariate data=data_a;
var var1-var5;
output out=univ_out mean=m1-m5 std=std1-std5
   p5=p5_1-p5_5 p95=p95_1-p95_5;
run;

 

--
Paige Miller
blueskyxyz
Lapis Lazuli | Level 10
/**********************************************
*** purpose : do loop for each var 
*** writen by : blueskyxyz
*** date: 2023-Jan-7
/**********************************************/
data data_a;
input stid $ var1 var2 var3 var4 var5;
datalines;
s1 0.1 1.9 2.7 4.6 3.9
s2 10.1 1.9 2.7 2.0 3.1
s3 0.1 1.2 2.4 3.9 6.9
s4 0.7 1.9 2.7 3.6 7.0
;

/*split with blank space*/
%LET numvar = var1 var2 var3 var4 var5;
%let cnt=%eval(%sysfunc(countc(&numvar," "))+1);
%put &cnt;

%macro loop_box();
%do i= 1 %to &cnt;
	%let var=%scan(&numvar.,&i.);
	%put &var.;

	ods output sgplot=boxplot_data;
	proc sgplot data=data_a;
		vbox &var. / datalabel=stid;
	run;

	ods excel file="F:\Mysas\sas_code\&var..xlsx"  ;
	proc print data=boxplot_data;
	run;
	ods excel close;
%end;
%mend;

%loop_box();
blueskyxyz
Lapis Lazuli | Level 10
please refer to the code I post , maybe that's you want

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1016 views
  • 6 likes
  • 5 in conversation