Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Variable Name Truncation

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-28-2015 03:11 PM
(4149 views)

Hi Team,

I am using a macro that cap outliers based on box-plot / turkey method. But the problem arises PROC MEANS truncates variable names and the same problem happens with PROC LOGISTIC.

Example variable name : prod_INVT_OLD_BAL_AVG_ACT_1_3M

This causes my macro to not working properly.

%macro box(input=, vars=, output= );

%let QR=;

%let Q1=;

%let Q3=;

%let varL=;

%let varH=;

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

%let varL = &varL &val.L;

%let varH = &varH &val.H;

%end;

/* Calculate the quartiles and inter-quartile range using proc univariate */

proc means data=&output nway noprint;

var &vars;

output out=temp QRANGE = P25= P75= / autoname;

run;

/* Extract the upper and lower limits into macro variables */

data temp;

set temp;

ID = 1;

array vara(&n) &QR;

array varb(&n) &Q1;

array varc(&n) &Q3;

array lower(&n) &varL;

array upper(&n) &varH;

do i = 1 to dim(vara);

lower(i) = varb(i) - 3 * vara(i);

upper(i) = varc(i) + 3 * vara(i);

end;

drop i _type_ _freq_;

run;

data temp1;

set &input;

ID = 1;

run;

data &output;

merge temp1 temp;

by ID;

array vars(&n) &vars;

array lower(&n) &varL;

array upper(&n) &varH;

do i = 1 to dim(vars);

if not missing(vars(i)) then do;

if vars(i) < lower(i) then vars(i) = lower(i);

if vars(i) > upper(i) then vars(i) = upper(i);

end;

end;

drop &QR &Q1 &Q3 &varL &varH ID i;

run;

%mend;

%box(input=abcd, vars = a b c d, output= test);

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is not an easy solution, but it is very scalable and flexible. You would need to change your logic and processing, but you could do this using three output data sets rather than one:

proc means data=&output nway noprint;

var &vars;

output out=qrange (keep=&vars) qrange=;

output out=p25 (keep=&vars) p25=;

output out=p75 (keep&vars) p75=;

run;

By creating three data sets, you are free to re-use the original variable names to hold the calculated statistics. To aggregate them later, you would need to read them one data set at a time:

data stats;

array vars {&n} &vars;

array ranges {&n} range1-range25;

array q1 {&n} q1_1 - q1_&n;

array q3 {&n} q3_1 - q3_&n;

set qrange;

do _i_=1 to &n;

ranges{_i_} = vars{_i_};

end;

set p25;

do _i_=1 to &n;

q1{_i_} = vars{_i_};

end;

set p75;

do _i_=1 to &n;

q3{_i_} = vars{_i_};

end;

... more calculations as needed;

run;

In my mind, it's the right way to go because you never need to change any variable names. Good luck.

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do you know what the maximum length of a SAS variable name is? When you provide a 30 character variable and use autoname option to add enough characters to exceed that number of characters what should SAS do for a result?

This will be a problem with any fixed length limit of variable names though:

prod_INVT_OLD_BAL_AVG_ACT_1_3M

could be

ProdInvtOldBalAvgAct_1_3M

and save a few characters without losing information. Not that I want to type either of those very often.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is not an easy solution, but it is very scalable and flexible. You would need to change your logic and processing, but you could do this using three output data sets rather than one:

proc means data=&output nway noprint;

var &vars;

output out=qrange (keep=&vars) qrange=;

output out=p25 (keep=&vars) p25=;

output out=p75 (keep&vars) p75=;

run;

By creating three data sets, you are free to re-use the original variable names to hold the calculated statistics. To aggregate them later, you would need to read them one data set at a time:

data stats;

array vars {&n} &vars;

array ranges {&n} range1-range25;

array q1 {&n} q1_1 - q1_&n;

array q3 {&n} q3_1 - q3_&n;

set qrange;

do _i_=1 to &n;

ranges{_i_} = vars{_i_};

end;

set p25;

do _i_=1 to &n;

q1{_i_} = vars{_i_};

end;

set p75;

do _i_=1 to &n;

q3{_i_} = vars{_i_};

end;

... more calculations as needed;

run;

In my mind, it's the right way to go because you never need to change any variable names. Good luck.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What do you want in the output: The complete long variable name or the full suffix for the statistics requested with the autoname option? With variables named prod_INVT_OLD_BAL_AVG_ACT_1_3M you can't have both.

You can get lists of the variables generated by proc means like this:

proc sql ;

select NAME into : QR separated by ' '

from dictionary.columns

where memname='TEMP' and Libname='WORK'

and index(upcase(NAME),'QRANGE')>0;

select NAME into : Q1 separated by ' '

from dictionary.columns

where memname='TEMP' and Libname='WORK'

and index(upcase(NAME),'P25')>0;

select NAME into : Q3 separated by ' '

from dictionary.columns

where memname='TEMP' and Libname='WORK'

and index(upcase(NAME),'P75')>0;

quit;

Assuming none of the variables have the key words QRANGE, P25 or P75 as part of their names.

You wouldn't need the

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

any more.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Since you are using a macro you can make your own names for the variables you generate.

Call them VAR1 to VAR20 or SAM, JOE, FRED.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your code has several mistakes beyond your stated issue, that would be more concerning to me:

1. proc means references the &output dataset when it should probably reference the &input dataset.

2. The following would only keep each macro variable retaining the last values, i.e. it doesn't do what you think it does.

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

%let varL = &varL &val.L;

%let varH = &varH &val.H;

%end;

EDIT:

I think this would have been your original source for the macro:

SAS Macro : Capping Outliers - Listen Data

Here's a quick sample I wrote that works differently. Unless you have huge data where multiple passes on the data is an issue a modification of this method should work well.

SAS Box-Plot/Tukey Method of Capping Outliers

*Calculate IQR and first/third quartiles;

proc means data=sashelp.class stackods n qrange p25 p75;

var weight height;

ods output summary=ranges;

run;

*create data with outliers to check;

data capped;

set sashelp.class;

if name='Alfred' then weight=220;

if name='Jane' then height=-30;

run;

*macro to cap outliers;

%macro cap(dset=,var=, lower=, upper=);

data &dset;

set &dset;

if &var>&upper then &var=&upper;

if &var<&lower then &var=&lower;

run;

%mend;

*create cutoffs and execute macro for each variable;

data cutoffs;

set ranges;

lower=p25-3*qrange;

upper=p75+3*qrange;

string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");

call execute(string);

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.