Help using Base SAS procedures

Variable Name Truncation

Accepted Solution Solved
Reply
Regular Contributor
Posts: 183
Accepted Solution

Variable Name Truncation

Hi Team,

I am using a macro that cap outliers based on box-plot / turkey method. But the problem arises PROC MEANS truncates variable names and the same problem happens with PROC LOGISTIC.

Example variable name : prod_INVT_OLD_BAL_AVG_ACT_1_3M

This causes my macro to not working properly.

%macro box(input=, vars=, output= );

%let QR=;

%let Q1=;

%let Q3=;

%let varL=;

%let varH=;

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

%let varL = &varL &val.L;

%let varH = &varH &val.H;

%end;

/* Calculate the quartiles and inter-quartile range using proc univariate */

proc means data=&output nway noprint;

var &vars;

output out=temp QRANGE = P25= P75= / autoname;

run;

/* Extract the upper and lower limits into macro variables */

data temp;

set temp;

ID = 1;

array vara(&n)  &QR;

array varb(&n)  &Q1;

array varc(&n)  &Q3;

array lower(&n) &varL;

array upper(&n) &varH;

do i = 1 to dim(vara);

lower(i) = varb(i) - 3 * vara(i);

upper(i) = varc(i) + 3 * vara(i);

end;

drop i _type_ _freq_;

run;

data temp1;

set &input;

ID = 1;

run;

data &output;

merge temp1 temp;

by ID;

array vars(&n) &vars;

array lower(&n) &varL;

array upper(&n) &varH;

do i = 1 to dim(vars);

if not missing(vars(i)) then do;

if vars(i) < lower(i) then vars(i) = lower(i);

if vars(i) > upper(i) then vars(i) = upper(i);

end;

end;

drop &QR &Q1 &Q3 &varL &varH ID i;

run;

%mend;

%box(input=abcd, vars = a b c d, output= test);


Accepted Solutions
Solution
‎04-28-2015 04:28 PM
Super User
Posts: 5,505

Re: Variable Name Truncation

This is not an easy solution, but it is very scalable and flexible.  You would need to change your logic and processing, but you could do this using three output data sets rather than one:

proc means data=&output nway noprint;

   var &vars;
   output out=qrange (keep=&vars) qrange=;

   output out=p25 (keep=&vars) p25=;

   output out=p75 (keep&vars) p75=;

run;

By creating three data sets, you are free to re-use the original variable names to hold the calculated statistics.  To aggregate them later, you would need to read them one data set at a time:

data stats;

   array vars {&n} &vars;

   array ranges {&n} range1-range25;

   array q1 {&n} q1_1 - q1_&n;

   array q3 {&n} q3_1 - q3_&n;

   set qrange;

   do _i_=1 to &n;

      ranges{_i_} = vars{_i_};

   end;

   set p25;

   do _i_=1 to &n;

      q1{_i_} = vars{_i_};

   end;

   set p75;

   do _i_=1 to &n;

      q3{_i_} = vars{_i_};

   end;

   ... more calculations as needed;

run;

In my mind, it's the right way to go because you never need to change any variable names.  Good luck.

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Variable Name Truncation

Do you know what the maximum length of a SAS variable name is? When you provide a 30 character variable and use autoname option to add enough characters to exceed that number of characters what should SAS do for a result?

This will be a problem with any fixed length limit of variable names though:

prod_INVT_OLD_BAL_AVG_ACT_1_3M

could be

ProdInvtOldBalAvgAct_1_3M

and save a few characters without losing information. Not that I want to type either of those very often.

Regular Contributor
Posts: 183

Re: Variable Name Truncation

I know the variable names in SAS can have a maximum length of 32 characters. I am asking for a solution. I cannot remove "_" from a variable name. Thanks for your time!

Solution
‎04-28-2015 04:28 PM
Super User
Posts: 5,505

Re: Variable Name Truncation

This is not an easy solution, but it is very scalable and flexible.  You would need to change your logic and processing, but you could do this using three output data sets rather than one:

proc means data=&output nway noprint;

   var &vars;
   output out=qrange (keep=&vars) qrange=;

   output out=p25 (keep=&vars) p25=;

   output out=p75 (keep&vars) p75=;

run;

By creating three data sets, you are free to re-use the original variable names to hold the calculated statistics.  To aggregate them later, you would need to read them one data set at a time:

data stats;

   array vars {&n} &vars;

   array ranges {&n} range1-range25;

   array q1 {&n} q1_1 - q1_&n;

   array q3 {&n} q3_1 - q3_&n;

   set qrange;

   do _i_=1 to &n;

      ranges{_i_} = vars{_i_};

   end;

   set p25;

   do _i_=1 to &n;

      q1{_i_} = vars{_i_};

   end;

   set p75;

   do _i_=1 to &n;

      q3{_i_} = vars{_i_};

   end;

   ... more calculations as needed;

run;

In my mind, it's the right way to go because you never need to change any variable names.  Good luck.

Super User
Posts: 11,343

Re: Variable Name Truncation

What do you want in the output: The complete long variable name or the full suffix for the statistics requested with the autoname option? With variables named prod_INVT_OLD_BAL_AVG_ACT_1_3M you can't have both.

You can get lists of the variables generated by proc means like this:

proc sql ;

   select NAME into : QR separated by ' '

   from dictionary.columns

   where memname='TEMP' and Libname='WORK'

      and index(upcase(NAME),'QRANGE')>0;

   select NAME into : Q1 separated by ' '

   from dictionary.columns

   where memname='TEMP' and Libname='WORK'

      and index(upcase(NAME),'P25')>0;

   select NAME into : Q3 separated by ' '

   from dictionary.columns

   where memname='TEMP' and Libname='WORK'

      and index(upcase(NAME),'P75')>0;

quit;

Assuming none of the variables have the key words QRANGE, P25 or P75 as part of their names.

You wouldn't need the

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

any more.

Super User
Super User
Posts: 7,050

Re: Variable Name Truncation

Since you are using a macro you can make your own names for the variables you generate.

Call them VAR1 to VAR20 or SAM, JOE, FRED.

Super User
Posts: 19,810

Re: Variable Name Truncation

Your code has several mistakes beyond your stated issue, that would be more concerning to me:

1.  proc means references the &output dataset when it should probably reference the &input dataset.

2.  The following would only keep each macro variable retaining the last values, i.e. it doesn't do what you think it does.

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

%let varL = &varL &val.L;

%let varH = &varH &val.H;

%end;

EDIT:

I think this would have been your original source for the macro:

SAS Macro : Capping Outliers - Listen Data

Here's a quick sample I wrote that works differently. Unless you have huge data where multiple passes on the data is an issue a modification of this method should work well.

SAS Box-Plot/Tukey Method of Capping Outliers

*Calculate IQR and first/third quartiles;

proc means data=sashelp.class stackods n qrange p25 p75;

var weight height;

ods output summary=ranges;

run;

*create data with outliers to check;

data capped;

set sashelp.class;

if name='Alfred' then weight=220;

if name='Jane' then height=-30;

run;

*macro to cap outliers;

%macro cap(dset=,var=, lower=, upper=);

data &dset;

set &dset;

if &var>&upper then &var=&upper;

if &var<&lower then &var=&lower;

run;

%mend;

*create cutoffs and execute macro for each variable;

data cutoffs;

set ranges;

lower=p25-3*qrange;

upper=p75+3*qrange;

string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");

call execute(string);

run;

Regular Contributor
Posts: 183

Re: Variable Name Truncation

Thank you so much. It's a nice code but it will be very heavy as my dataset consists of 1 million observations and 1K variables. But i learnt a lot of tricks from your code. Yes, i have copied code from that site. I have written an email to the author regd issues you highlighted just above. God bless you! :-)

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 1088 views
  • 4 likes
  • 5 in conversation