BookmarkSubscribeRSS Feed
Riya88
Fluorite | Level 6

Hi SAS Experts -

I have a macro that calculates acceptable range of a variable. An acceptable range is defined by :

Lower Limit = Q1 - 1.5*(Q3-Q1)

Upper Limit = Q3 + 1.5*(Q3-Q1)

It's a boxplot method of calculating outliers. The macro is working fine. But it is inefficient in terms of its processing as it calculates outliers for each variable in a loop and then capping values. I want proc univariate to be run for all the variables (not in loop) and save output in a dataset and then capping for variables using IF THEN at one time only.

Code : -

options mprint symbolgen;

%macro outliers(input=, vars=, output= );

data &output;

set &input;

run;

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

/* Calculate the quartiles and inter-quartile range using proc univariate */

proc univariate data=&output noprint;

var &val;

output out=temp QRANGE= IQR Q1= First_Qtl Q3= Third_Qtl;

run;

/* Extract the upper and lower limits into macro variables */

data _null_;

set temp;

call symput('QR', IQR);

call symput('Q1', First_Qtl);

call symput('Q3', Third_Qtl);

run;

%let ULimit=%sysevalf(&Q3 + 1.5 * &QR);

%let LLimit=%sysevalf(&Q1 - 1.5 * &QR);

/* Final dataset excluding outliers*/

data &output;

set &output;

if &val < &Llimit then &val = &Llimit;

if &val > &Ulimit then &val = &Ulimit;

run;

%end;

%mend;

%outliers(Input=abcd, Vars = a, output= test);

o

Thanks in anticipation!

1 REPLY 1
Astounding
PROC Star

I can outline an approach, but I don't have the time to give you all the details.

 

Consider this variation:

 

proc univariate data=&input noprint;

var &vars;

output out=ranges (keep=&vars) qrange=;

output out=q1 (keep=&vars) q1=;

output out=q3 (keep=&vars) q3=;

run;

 

That gives you three small output data sets (one observation apiece).  You can investigate for yourself, but in the Q1 data set, each variable will be the Q1 value for that same original variable name.

 

Next step:  transpose the three data sets so you have two columns in each (for example, original variable name, and the Q1 value).  You're working with small data sets so the processing time will be minimal.

 

With all three data sets transposed, use a DATA step to read them in and write out IF/THEN statements to a file.  Again, you're working with tiny data sets and the processing time will be minimal.

 

Finally, %include the IF/THEN statements in a DATA step to perform the calculations.

 

It is conceivable that ODS can save some of the work by producing an output data set with one row per variable and three statistics.  I'm not familiar enough with the possible ODS outputs from univariate to know.

 

Good luck.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 762 views
  • 1 like
  • 2 in conversation