BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ujjawal
Quartz | Level 8

Hi Team,

I am using a macro that cap outliers based on box-plot / turkey method. But the problem arises PROC MEANS truncates variable names and the same problem happens with PROC LOGISTIC.

Example variable name : prod_INVT_OLD_BAL_AVG_ACT_1_3M

This causes my macro to not working properly.

%macro box(input=, vars=, output= );

%let QR=;

%let Q1=;

%let Q3=;

%let varL=;

%let varH=;

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

%let varL = &varL &val.L;

%let varH = &varH &val.H;

%end;

/* Calculate the quartiles and inter-quartile range using proc univariate */

proc means data=&output nway noprint;

var &vars;

output out=temp QRANGE = P25= P75= / autoname;

run;

/* Extract the upper and lower limits into macro variables */

data temp;

set temp;

ID = 1;

array vara(&n)  &QR;

array varb(&n)  &Q1;

array varc(&n)  &Q3;

array lower(&n) &varL;

array upper(&n) &varH;

do i = 1 to dim(vara);

lower(i) = varb(i) - 3 * vara(i);

upper(i) = varc(i) + 3 * vara(i);

end;

drop i _type_ _freq_;

run;

data temp1;

set &input;

ID = 1;

run;

data &output;

merge temp1 temp;

by ID;

array vars(&n) &vars;

array lower(&n) &varL;

array upper(&n) &varH;

do i = 1 to dim(vars);

if not missing(vars(i)) then do;

if vars(i) < lower(i) then vars(i) = lower(i);

if vars(i) > upper(i) then vars(i) = upper(i);

end;

end;

drop &QR &Q1 &Q3 &varL &varH ID i;

run;

%mend;

%box(input=abcd, vars = a b c d, output= test);

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

This is not an easy solution, but it is very scalable and flexible.  You would need to change your logic and processing, but you could do this using three output data sets rather than one:

proc means data=&output nway noprint;

   var &vars;
   output out=qrange (keep=&vars) qrange=;

   output out=p25 (keep=&vars) p25=;

   output out=p75 (keep&vars) p75=;

run;

By creating three data sets, you are free to re-use the original variable names to hold the calculated statistics.  To aggregate them later, you would need to read them one data set at a time:

data stats;

   array vars {&n} &vars;

   array ranges {&n} range1-range25;

   array q1 {&n} q1_1 - q1_&n;

   array q3 {&n} q3_1 - q3_&n;

   set qrange;

   do _i_=1 to &n;

      ranges{_i_} = vars{_i_};

   end;

   set p25;

   do _i_=1 to &n;

      q1{_i_} = vars{_i_};

   end;

   set p75;

   do _i_=1 to &n;

      q3{_i_} = vars{_i_};

   end;

   ... more calculations as needed;

run;

In my mind, it's the right way to go because you never need to change any variable names.  Good luck.

View solution in original post

7 REPLIES 7
ballardw
Super User

Do you know what the maximum length of a SAS variable name is? When you provide a 30 character variable and use autoname option to add enough characters to exceed that number of characters what should SAS do for a result?

This will be a problem with any fixed length limit of variable names though:

prod_INVT_OLD_BAL_AVG_ACT_1_3M

could be

ProdInvtOldBalAvgAct_1_3M

and save a few characters without losing information. Not that I want to type either of those very often.

Ujjawal
Quartz | Level 8

I know the variable names in SAS can have a maximum length of 32 characters. I am asking for a solution. I cannot remove "_" from a variable name. Thanks for your time!

Astounding
PROC Star

This is not an easy solution, but it is very scalable and flexible.  You would need to change your logic and processing, but you could do this using three output data sets rather than one:

proc means data=&output nway noprint;

   var &vars;
   output out=qrange (keep=&vars) qrange=;

   output out=p25 (keep=&vars) p25=;

   output out=p75 (keep&vars) p75=;

run;

By creating three data sets, you are free to re-use the original variable names to hold the calculated statistics.  To aggregate them later, you would need to read them one data set at a time:

data stats;

   array vars {&n} &vars;

   array ranges {&n} range1-range25;

   array q1 {&n} q1_1 - q1_&n;

   array q3 {&n} q3_1 - q3_&n;

   set qrange;

   do _i_=1 to &n;

      ranges{_i_} = vars{_i_};

   end;

   set p25;

   do _i_=1 to &n;

      q1{_i_} = vars{_i_};

   end;

   set p75;

   do _i_=1 to &n;

      q3{_i_} = vars{_i_};

   end;

   ... more calculations as needed;

run;

In my mind, it's the right way to go because you never need to change any variable names.  Good luck.

ballardw
Super User

What do you want in the output: The complete long variable name or the full suffix for the statistics requested with the autoname option? With variables named prod_INVT_OLD_BAL_AVG_ACT_1_3M you can't have both.

You can get lists of the variables generated by proc means like this:

proc sql ;

   select NAME into : QR separated by ' '

   from dictionary.columns

   where memname='TEMP' and Libname='WORK'

      and index(upcase(NAME),'QRANGE')>0;

   select NAME into : Q1 separated by ' '

   from dictionary.columns

   where memname='TEMP' and Libname='WORK'

      and index(upcase(NAME),'P25')>0;

   select NAME into : Q3 separated by ' '

   from dictionary.columns

   where memname='TEMP' and Libname='WORK'

      and index(upcase(NAME),'P75')>0;

quit;

Assuming none of the variables have the key words QRANGE, P25 or P75 as part of their names.

You wouldn't need the

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

any more.

Tom
Super User Tom
Super User

Since you are using a macro you can make your own names for the variables you generate.

Call them VAR1 to VAR20 or SAM, JOE, FRED.

Reeza
Super User

Your code has several mistakes beyond your stated issue, that would be more concerning to me:

1.  proc means references the &output dataset when it should probably reference the &input dataset.

2.  The following would only keep each macro variable retaining the last values, i.e. it doesn't do what you think it does.

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

%let varL = &varL &val.L;

%let varH = &varH &val.H;

%end;

EDIT:

I think this would have been your original source for the macro:

SAS Macro : Capping Outliers - Listen Data

Here's a quick sample I wrote that works differently. Unless you have huge data where multiple passes on the data is an issue a modification of this method should work well.

SAS Box-Plot/Tukey Method of Capping Outliers

*Calculate IQR and first/third quartiles;

proc means data=sashelp.class stackods n qrange p25 p75;

var weight height;

ods output summary=ranges;

run;

*create data with outliers to check;

data capped;

set sashelp.class;

if name='Alfred' then weight=220;

if name='Jane' then height=-30;

run;

*macro to cap outliers;

%macro cap(dset=,var=, lower=, upper=);

data &dset;

set &dset;

if &var>&upper then &var=&upper;

if &var<&lower then &var=&lower;

run;

%mend;

*create cutoffs and execute macro for each variable;

data cutoffs;

set ranges;

lower=p25-3*qrange;

upper=p75+3*qrange;

string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");

call execute(string);

run;

Ujjawal
Quartz | Level 8

Thank you so much. It's a nice code but it will be very heavy as my dataset consists of 1 million observations and 1K variables. But i learnt a lot of tricks from your code. Yes, i have copied code from that site. I have written an email to the author regd issues you highlighted just above. God bless you! 🙂

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3765 views
  • 4 likes
  • 5 in conversation