turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Variable Name Truncation

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-28-2015 03:11 PM

Hi Team,

I am using a macro that cap outliers based on box-plot / turkey method. But the problem arises PROC MEANS truncates variable names and the same problem happens with PROC LOGISTIC.

Example variable name : prod_INVT_OLD_BAL_AVG_ACT_1_3M

This causes my macro to not working properly.

%macro box(input=, vars=, output= );

%let QR=;

%let Q1=;

%let Q3=;

%let varL=;

%let varH=;

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

%let varL = &varL &val.L;

%let varH = &varH &val.H;

%end;

/* Calculate the quartiles and inter-quartile range using proc univariate */

proc means data=&output nway noprint;

var &vars;

output out=temp QRANGE = P25= P75= / autoname;

run;

/* Extract the upper and lower limits into macro variables */

data temp;

set temp;

ID = 1;

array vara(&n) &QR;

array varb(&n) &Q1;

array varc(&n) &Q3;

array lower(&n) &varL;

array upper(&n) &varH;

do i = 1 to dim(vara);

lower(i) = varb(i) - 3 * vara(i);

upper(i) = varc(i) + 3 * vara(i);

end;

drop i _type_ _freq_;

run;

data temp1;

set &input;

ID = 1;

run;

data &output;

merge temp1 temp;

by ID;

array vars(&n) &vars;

array lower(&n) &varL;

array upper(&n) &varH;

do i = 1 to dim(vars);

if not missing(vars(i)) then do;

if vars(i) < lower(i) then vars(i) = lower(i);

if vars(i) > upper(i) then vars(i) = upper(i);

end;

end;

drop &QR &Q1 &Q3 &varL &varH ID i;

run;

%mend;

%box(input=abcd, vars = a b c d, output= test);

Accepted Solutions

Solution

04-28-2015
04:28 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

04-28-2015 04:28 PM

This is not an easy solution, but it is very scalable and flexible. You would need to change your logic and processing, but you could do this using three output data sets rather than one:

proc means data=&output nway noprint;

var &vars;

output out=qrange (keep=&vars) qrange=;

output out=p25 (keep=&vars) p25=;

output out=p75 (keep&vars) p75=;

run;

By creating three data sets, you are free to re-use the original variable names to hold the calculated statistics. To aggregate them later, you would need to read them one data set at a time:

data stats;

array vars {&n} &vars;

array ranges {&n} range1-range25;

array q1 {&n} q1_1 - q1_&n;

array q3 {&n} q3_1 - q3_&n;

set qrange;

do _i_=1 to &n;

ranges{_i_} = vars{_i_};

end;

set p25;

do _i_=1 to &n;

q1{_i_} = vars{_i_};

end;

set p75;

do _i_=1 to &n;

q3{_i_} = vars{_i_};

end;

... more calculations as needed;

run;

In my mind, it's the right way to go because you never need to change any variable names. Good luck.

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

04-28-2015 04:03 PM

Do you know what the maximum length of a SAS variable name is? When you provide a 30 character variable and use autoname option to add enough characters to exceed that number of characters what should SAS do for a result?

This will be a problem with any fixed length limit of variable names though:

prod_INVT_OLD_BAL_AVG_ACT_1_3M

could be

ProdInvtOldBalAvgAct_1_3M

and save a few characters without losing information. Not that I want to type either of those very often.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

04-28-2015 04:11 PM

I know the variable names in SAS can have a maximum length of 32 characters. I am asking for a solution. I cannot remove "_" from a variable name. Thanks for your time!

Solution

04-28-2015
04:28 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

04-28-2015 04:28 PM

This is not an easy solution, but it is very scalable and flexible. You would need to change your logic and processing, but you could do this using three output data sets rather than one:

proc means data=&output nway noprint;

var &vars;

output out=qrange (keep=&vars) qrange=;

output out=p25 (keep=&vars) p25=;

output out=p75 (keep&vars) p75=;

run;

By creating three data sets, you are free to re-use the original variable names to hold the calculated statistics. To aggregate them later, you would need to read them one data set at a time:

data stats;

array vars {&n} &vars;

array ranges {&n} range1-range25;

array q1 {&n} q1_1 - q1_&n;

array q3 {&n} q3_1 - q3_&n;

set qrange;

do _i_=1 to &n;

ranges{_i_} = vars{_i_};

end;

set p25;

do _i_=1 to &n;

q1{_i_} = vars{_i_};

end;

set p75;

do _i_=1 to &n;

q3{_i_} = vars{_i_};

end;

... more calculations as needed;

run;

In my mind, it's the right way to go because you never need to change any variable names. Good luck.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

04-28-2015 04:48 PM

What do you want in the output: The complete long variable name or the full suffix for the statistics requested with the autoname option? With variables named prod_INVT_OLD_BAL_AVG_ACT_1_3M you can't have both.

You can get lists of the variables generated by proc means like this:

proc sql ;

select NAME into : QR separated by ' '

from dictionary.columns

where memname='TEMP' and Libname='WORK'

and index(upcase(NAME),'QRANGE')>0;

select NAME into : Q1 separated by ' '

from dictionary.columns

where memname='TEMP' and Libname='WORK'

and index(upcase(NAME),'P25')>0;

select NAME into : Q3 separated by ' '

from dictionary.columns

where memname='TEMP' and Libname='WORK'

and index(upcase(NAME),'P75')>0;

quit;

Assuming none of the variables have the key words QRANGE, P25 or P75 as part of their names.

You wouldn't need the

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

any more.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

04-28-2015 05:38 PM

Since you are using a macro you can make your own names for the variables you generate.

Call them VAR1 to VAR20 or SAM, JOE, FRED.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ujjawal

04-28-2015 10:31 PM

Your code has several mistakes beyond your stated issue, that would be more concerning to me:

1. proc means references the &output dataset when it should probably reference the &input dataset.

2. The following would only keep each macro variable retaining the last values, i.e. it doesn't do what you think it does.

%let n=%sysfunc(countw(&vars));

%do i= 1 %to &n;

%let val = %scan(&vars,&i);

%let QR = &QR &val._QRange;

%let Q1 = &Q1 &val._P25;

%let Q3 = &Q3 &val._P75;

%let varL = &varL &val.L;

%let varH = &varH &val.H;

%end;

EDIT:

I think this would have been your original source for the macro:

SAS Macro : Capping Outliers - Listen Data

Here's a quick sample I wrote that works differently. Unless you have huge data where multiple passes on the data is an issue a modification of this method should work well.

SAS Box-Plot/Tukey Method of Capping Outliers

*Calculate IQR and first/third quartiles;

proc means data=sashelp.class stackods n qrange p25 p75;

var weight height;

ods output summary=ranges;

run;

*create data with outliers to check;

data capped;

set sashelp.class;

if name='Alfred' then weight=220;

if name='Jane' then height=-30;

run;

*macro to cap outliers;

%macro cap(dset=,var=, lower=, upper=);

data &dset;

set &dset;

if &var>&upper then &var=&upper;

if &var<&lower then &var=&lower;

run;

%mend;

*create cutoffs and execute macro for each variable;

data cutoffs;

set ranges;

lower=p25-3*qrange;

upper=p75+3*qrange;

string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");

call execute(string);

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

04-29-2015 03:43 PM

Thank you so much. It's a nice code but it will be very heavy as my dataset consists of 1 million observations and 1K variables. But i learnt a lot of tricks from your code. Yes, i have copied code from that site. I have written an email to the author regd issues you highlighted just above. God bless you! :-)