DATA Step, Macro, Functions and more

Finding the max length that occurs within each variable

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 89
Accepted Solution

Finding the max length that occurs within each variable

Imagine that you have 1 table with roughly 150 character variables.

 

You need to find the maximum value length that occurs within each variable.

 

What's a good way to determine this?

 

I'm mostly working in DI Studio, and don't want to manually type "max(length(varname))" 150 times. Would appreciate code suggestions.


Accepted Solutions
Solution
‎05-04-2016 09:26 AM
Respected Advisor
Posts: 3,777

Re: Finding the max length that occurs within each variable

[ Edited ]

A hash of character variables constructed from an array of all _CHARACTER_ variables can be used to keep a running MAX length that is output at the end.

 

%let file=sashelp.heart;

/*------------------------------------------------------------------------*
 | data set _OPTLEN will have one obs for each character variable with
 | name and length as max length of each.
 *------------------------------------------------------------------------*/
data _null_;
   if 0 then set &file(keep=_character_);
   array _c[*] _character_;
   declare hash h();
   h.definekey('_n_');
   length _memname_ _name_ $32 _l_ 8;
   retain _memname_ "%scan(&file,-1,.)";
   h.definedata('_memname_','_name_','_l_');
   h.definedone();
   _l_=0;
   do _n_ = 1 to dim(_c);
      _name_ = vname(_c[_n_]);
      _rc_ = h.add();
      end;
   do until(eof);
      set &file(keep=_character_) end=eof;
      do _n_ = 1 to dim(_c);
         _rc_ = h.find();
         _l_ = _l_ max length(_c[_n_]);
         _rc_ = h.replace();
         end;
      end;
   length _dataset_ $256;
   _dataset_ = cats('_optlen','(rename=(_memname_=memname _name_=name _l_=length))');
   h.output(dataset:strip(_dataset_));
   stop;
   run;
proc print data=_optlen;
   run;

Capture.PNG

View solution in original post


All Replies
Super User
Posts: 5,255

Re: Finding the max length that occurs within each variable

Not knowing what kind of data you are talking about, but is this i DW data, 150 variables is just too much in one table.

By transposing/normalizing the data, you get much better control of it, and operations like this is much simpler to develop/maintain.

 

Is this an ad-hoc thing that you want to use as a basis for your DI Studio/ETL design, or do you wish to incorporate this logic into your job? If the later - how do you intend to use it?

Data never sleeps
Super User
Super User
Posts: 7,392

Re: Finding the max length that occurs within each variable

Am with @LinusH, working with lots of variables is not the best idea.  The simple way to think about tables for Programming purpose is that variables are Structure - this should be minimal, not changing.  Rows are data, these can expand and shrink as needed.

Once you get this concept, and also that the data you display in output reports is not necessarily the same as the data you program with, you will find your problem is a very simple one:

Data looks like:

<other vars>   VAR_NAME        VAR_TEXT

...                  CHARACTER1   ABC

...                  CHARACTER2   DEFTY

...

So your original columns are now rows, then:

proc sql;
  create table LENGTHS as
  select VAR_NAME,
            max(lengthn(VAR_TEXT)) as MAX_LEN
  from   HAVE
  group by VAR_NAME;
quit;
Solution
‎05-04-2016 09:26 AM
Respected Advisor
Posts: 3,777

Re: Finding the max length that occurs within each variable

[ Edited ]

A hash of character variables constructed from an array of all _CHARACTER_ variables can be used to keep a running MAX length that is output at the end.

 

%let file=sashelp.heart;

/*------------------------------------------------------------------------*
 | data set _OPTLEN will have one obs for each character variable with
 | name and length as max length of each.
 *------------------------------------------------------------------------*/
data _null_;
   if 0 then set &file(keep=_character_);
   array _c[*] _character_;
   declare hash h();
   h.definekey('_n_');
   length _memname_ _name_ $32 _l_ 8;
   retain _memname_ "%scan(&file,-1,.)";
   h.definedata('_memname_','_name_','_l_');
   h.definedone();
   _l_=0;
   do _n_ = 1 to dim(_c);
      _name_ = vname(_c[_n_]);
      _rc_ = h.add();
      end;
   do until(eof);
      set &file(keep=_character_) end=eof;
      do _n_ = 1 to dim(_c);
         _rc_ = h.find();
         _l_ = _l_ max length(_c[_n_]);
         _rc_ = h.replace();
         end;
      end;
   length _dataset_ $256;
   _dataset_ = cats('_optlen','(rename=(_memname_=memname _name_=name _l_=length))');
   h.output(dataset:strip(_dataset_));
   stop;
   run;
proc print data=_optlen;
   run;

Capture.PNG

Frequent Contributor
Posts: 89

Re: Finding the max length that occurs within each variable

That worked perfectly, @data_null__! Will store it for future use. Appreciate it. Smiley Happy

Super User
Posts: 9,671

Re: Finding the max length that occurs within each variable

data _null_;
 set sashelp.vcolumn
    (keep=libname memname name type
     where=(libname='SASHELP' and memname='HEART' and type='char')) end=last;
if _n_ eq 1 then call execute('proc sql;create table temp as select ');
call execute(cat('max(length(',name,')) as ',name));
if last then call execute('from sashelp.heart;quit;');   
 else call execute(',');
run;
proc transpose data=temp out=want name=name prefix=length;
run;

x.png

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 379 views
  • 4 likes
  • 5 in conversation