BookmarkSubscribeRSS Feed
braam
Quartz | Level 8

I have many character variables, whose lengths are far greater than necessary. For example, the maximum length of a variable is 10, but its length is defined as 20. Is there a way to minimize these lengths at once so that each variable's length is the same as the maximum length of the variable? 

13 REPLIES 13
ghosh
Barite | Level 11
proc sql;
alter table tochange
  modify varname char(10) format=$10.;
quit;
braam
Quartz | Level 8
@ghosh Thanks. It's an intuitive solution. I have some follow-up questions.

1. Is there a simple way to revise many character variables without listing them all? (I'm thinking of something like _char_,. char*, but didn't find anything).

2. Or something like char1 (starting character variable in my list) -char10 (ending character variable)? I don't think there is something like this in SQL....

3. The code that you share with me assigns 10 to length of a variable. Extending this, I'm wondering if it can be the same as the maximum length of the variable. Well, some of my character variables use less than 10 characters while others use more than 10. Thanks again for your help.
ballardw
Super User

In places where lists of  variables are acceptable you can use _character_ to reference all the character variables.

 

It may for most purposes be easier just to change the display format when using the variables as Format will accept a list but most of the statements to actually modify the variable property, such as the example Proc SQL code will not.

 

format _character_ $5. ;

for example would be for the duration of a procedure only use/display 5 characters.

 

You could use Proc Datasets to permanently assign a format to all the variables.

 

And the ever popular question, if you only want a length of 10 why is the length longer? You may consider going back to the step that created the variables and change the code. But we would have to see the code used to create the variables to provide ideas.

 

Hint: Proc Import seldom gets the lengths you want and has potential other issues with character vs numeric values with some data sources.

Tom
Super User Tom
Super User

@ghosh wrote:
proc sql;
alter table tochange
  modify varname char(10) format=$10.;
quit;

One major drawback of that method is that is no way to clear the format attached to a variable with SQL syntax.  You can only set it something as in the example.  Or leave it unchanged.

 

So if someone had (accidentally? mistakenly?) attached the $20. format to the variable VARNAME in the example above then you are forced to perpetuate the mistaken by attaching the $10. format as in the posted code, or else the format will stay $20. even after the length was shortened from 20 to 10. 

 

So if you do want to remove the format attached to VARNAME you might want to use a data step to remake the dataset instead.

 

Or use PROC DATASETS to remove the format.

proc datasets nolist lib=work;
  modify tochange;
    format varname ;
  run;
quit;
hashman
Ammonite | Level 13

@braam:

Yup, it's simple. For example (a copy of sashelp.class being used as a guinea pig):

data have ;                                                                                                                                                                                                                                                     
  set sashelp.class ;                                                                                                                                                                                                                                           
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
%let dsn = have ;                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                
data _null_ ;                                                                                                                                                                                                                                                   
  array mm [99999] _temporary_ ;                                                                                                                                                                                                                                
  set &dsn end = end ;                                                                                                                                                                                                                                          
  array cc (j) _char_ ;                                                                                                                                                                                                                                         
  if dim (cc) = 0 then stop ;                                                                                                                                                                                                                                   
  do over cc ;                                                                                                                                                                                                                                                  
     mm[j] = mm[j] max length (cc) ;                                                                                                                                                                                                                            
  end ;                                                                                                                                                                                                                                                         
  if end ;                                                                                                                                                                                                                                                      
  call execute ("proc sql ;") ;                                                                                                                                                                                                                                 
  call execute ("  alter table &dsn modify ") ;                                                                                                                                                                                                                 
  length vname vdef vfmt $ 256 ;                                                                                                                                                                                                                                
  do over cc ;                                                                                                                                                                                                                                                  
    vname = cats ('"', vname (cc), '"n') ;                                                                                                                                                                                                                      
    vdef  = cats ("char (", mm[j], ")" ) ;                                                                                                                                                                                                                      
    vfmt  = cats ("format=$", mm[j], "." ) ;                                                                                                                                                                                                                    
    call execute (catx (" ", ifc (j > 1, ",", "  "), vname, vdef, vfmt)) ;                                                                                                                                                                                      
  end ;                                                                                                                                                                                                                                                         
  call execute ("; quit ;") ;                                                                                                                                                                                                                                   
run ;                             

What you'll see in the log (if the system option MSGLEVEL=i is in effect) after the above runs:

NOTE: CALL EXECUTE generated line.
1   + proc sql ;
2   +   alter table have modify
3   + "Name"n char (7) format=$7.
4   + , "Sex"n char (1) format=$1.
5   + ;
NOTE: Table WORK.HAVE has been modified, with 5 columns.
5   +   quit ;

If you happen to have no character variables in the file being processed in this manner, you'll get the harmless:

WARNING: Defining an array with zero elements.

and no SQL will be generated. The warning can be avoided by recoding the step using CALL VNEXT instead of the array CC. The "big enough" array MM can be replaced by a hash table. You can use these possibilities as a SAS exercise if you'd like.

 

Kind regards

Paul D.

  

SASKiwi
PROC Star

If the reason to reduce character variable lengths is to save disk space, then you can do this by re-writing your SAS datasets with the COMPRESS = YES or BINARY SAS option. Then you don't need to reduce lengths at all.

ScottBass
Rhodochrosite | Level 12

There are many ways to approach this, mine is but one.  You'll get a warning about "Multiple lengths were specified...".

 

Get the %loop and %loop_control macros here:

 

https://github.com/scottbass/SAS/blob/master/Macro/loop.sas

https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas

 

HTH...

 

proc copy in=sashelp out=work;
   select cars class shoes stocks zipcode;
run;

%macro code;
* get column metadata ;
proc contents data=&word out=contents (keep=name type length varnum) noprint;
run;

* get max length of character columns ;
data _null_;
   set &word end=eof;
   array chars{*} _character_;
   length _name_ $32 _max_length_ 8;
   if (_n_=1) then do;
      declare hash h();
      rc=h.defineKey("_name_");
      rc=h.defineData("_name_","_max_length_");
      rc=h.defineDone();
      do i=1 to dim(chars);
         _name_=vname(chars{i});
         rc=h.add();
      end;
   end;
   do i=1 to dim(chars);
      _name_=vname(chars{i});
      rc=h.find();
      _max_length_=max(_max_length_,length(chars{i}));
      rc=h.replace();
   end;
   if eof then do;
      rc=h.output(dataset:"work.max_length");
   end;
run;

* join ;
proc sql;
   create table attribs as
   select 
      a.name
      ,coalesce(b._max_length_,a.length) as length
      ,a.type
   from
      contents a
   left join
      max_length b
   on
      a.name=b._name_
   order by
      a.varnum
   ;
quit;

%macro attribs;
   %if (&type eq 1) %then %do;
      attrib &name length=&length;
   %end;
   %else %do;
      attrib &name length=$&length;
   %end;
%mend;

data &word._new;
   %loop_control(control=attribs,mname=attribs)
   set &word;
run;
%mend;

%loop(cars class shoes stocks zipcode)

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
ScottBass
Rhodochrosite | Level 12

And here's a version using PROC SQL + ALTER TABLE.

 

I always thought you had to re-run the data step if you wanted to change column length?  Is PROC SQL rewriting the table under the covers?  I didn't test with a big enough table to notice a performance difference.  But if it's just rewriting the dataset header and not having to rewrite the data, then this is definitely the better approach from a performance perspective.

 

Never too old to learn something new!

 

proc copy in=sashelp out=work;
   select cars class shoes stocks zipcode;
run;

%macro code;
* get max length of character columns ;
data _null_;
   set &word end=eof;
   array chars{*} _character_;
   length _name_ $32 _max_length_ 8;
   if (_n_=1) then do;
      declare hash h();
      rc=h.defineKey("_name_");
      rc=h.defineData("_name_","_max_length_");
      rc=h.defineDone();
      do i=1 to dim(chars);
         _name_=vname(chars{i});
         rc=h.add();
      end;
   end;
   do i=1 to dim(chars);
      _name_=vname(chars{i});
      rc=h.find();
      _max_length_=max(_max_length_,length(chars{i}));
      rc=h.replace();
   end;
   if eof then do;
      rc=h.output(dataset:"work.max_length");
   end;
run;

%macro alter;
   %let _name_=%trim(&_name_);
   %if (&__iter__ gt 1) %then ,;
   "&_name_"n char(&_max_length_)
%mend;

proc sql;
   alter table &word 
   modify
   %loop_control(control=max_length,mname=alter)
   ;
quit;
%mend;

%loop(cars class shoes stocks zipcode)

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Kurt_Bremser
Super User

Since the (character) variables are aligned within an observation in close succession, changing the length of one requires a change of observation length, and that needs a complete rewrite. You'll most probably have more observations in a dataset page, among other things.

It is a consequence of the way SAS stores datasets as sequential files, and not (e.g.) as separate blocks containing a single column each.

The only operations that don't need a rewrite are pure header changes (formats, informants, labels, names) and things like appending or marking observations as deleted.

hashman
Ammonite | Level 13

@ScottBass:

Scott, too bad. As @Kurt_Bremser has indicated, modifying the length of even a single variable does cause a complete rewrite. Just inflate a test data set like sashelp.class 1E6 times, and you'll see by the sheer ALTER execution time.

 

Kind regards

Paul D. 

ScottBass
Rhodochrosite | Level 12

@Kurt_Bremser and @hashman 

 

Yes, to be clear, that was my understanding:  to change the length of variable, the data set must be rewritten.

 

While the ALTER TABLE ... MODIFY approach in SQL is "syntactically elegant", it must be recreating the table under the covers.

 

I ran some test code in EG 7.15 which I've attached.  The execution time of the ALTER TABLE approach makes clear that the data set is recreated.

 

What is also evident is just how big an impact data set compression can make on performance, depending on the characteristics of the dataset (i.e. how much compression occurs).

 

(SAS Community Administrators:  can you work out a way to make it easier to attach a SAS Enterprise Guide project!!!  At least improve the generic error message when the attachment is rejected.)


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
hashman
Ammonite | Level 13

@ScottBass: First, thanks for sharing the project.

It just occurred to me that instead of rewriting the data set with the new lengths, an SQL view with new lengths can be created (or a DATA step view but its syntax with renaming and dropping would be more tedious). However, if the data set is to be read repeatedly, just rewriting it with the new lengths looks like a more sensible choice. I've had to do it on a number of occasions, particularly when the data came from an external RDBMS table, whose VARCHAR columns were interpreted and stored by SAS as crazily long fixed-length variables (apparently, to cover the territory without losing data). I've invariably found the pain of making one full pass through the data to determine the minimal lengths well worth it compared to using compression, not to mention that certain IDE environments (e.g., SAS Studio) don't work quite well when your character variables containing at most 10 characters are stored with the system length of $1000 or worse. Whether using a reduced-lengths view is better than compression performance-wise is an open question to be tested.   

Kind regards

Paul D.

 

 

ScottBass
Rhodochrosite | Level 12

In the deep recesses of my mind, I thought I might be reinventing the wheel ("what was the name of that macro again???")

 

While it was fun coding my project, here is another approach:

 

http://support.sas.com/kb/24/804.html

 

Google "%squeeze macro".

 

However, see also these comments:

 

https://groups.google.com/forum/#!msg/comp.soft-sys.sas/eTlvA9WimNM/HRq_G8W3N5QJ

 

I haven't reviewed the code, just wanted to post here since it's on topic.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 13132 views
  • 6 likes
  • 8 in conversation