DATA Step, Macro, Functions and more

change the length of all character variales in a dataset

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

change the length of all character variales in a dataset

hi all,

 

 i have a dataset which consist of character variables and numeric variables, as for the character variables, their length are defined as 8000 by default,so i want to shrink the dataset,change the length of character variable.

 

 as i should find all the character variable firstly.

 

i have tried the code:

%macro change(dsn);                                         
 
data _null_;                                                
  set &dsn;                                                 
  array qqq(*) _character_;                                 
  call symput('siz',put(dim(qqq),5.-L));                    
  stop;                                                     
run;                                                        
                                                            
data _null_;                                                
  set &dsn end=done;                                        
  array qqq(&siz) _character_;                              
  array www(&siz.);                                         
  if _n_=1 then do i= 1 to dim(www);                        
    www(i)=0;                                               
  end;                                                      
  do i = 1 to &siz.;                                        
    www(i)=max(www(i),length(qqq(i)));                      
  end;                                                      
  retain _all_;                                             
  if done then do;                                          
    do i = 1 to &siz.;                                      
      length vvv $50;                                       
      vvv=catx(' ','length',vname(qqq(i)),'$',www(i),';');  
      fff=catx(' ','format ',vname(qqq(i))||' '||           
          compress('$'||put(www(i),3.)||'.;'),' ');         
      call symput('lll'||put(i,3.-L),vvv) ;                 
      call symput('fff'||put(i,3.-L),fff) ;                 
    end;                                                    
  end;                                                      
run;                                                        
                                                            
data &dsn._;                                                
  %do i = 1 %to &siz.;                                      
    &&lll&i                                                 
    &&fff&i                                                 
  %end;                                                     
  set &dsn;                                                 
run;                                                        
                                                            
%mend;  

just from the sas help,and it illustrations is:

 

The first DATA Step counts the number of character variables in the data set and puts it into a macro variable. The next step loops through all of the data looking for the maximum number of characters in each variable. When all of the rows have been checked, a LENGTH statement and FORMAT statement are generated for each variable and saved in macro variables. In the final DATA Step the macro loop pushes the LENGTH and FORMAT statements out to the DATA Step ahead of the SET statement so that the length and format are changed.

NOTE that a new data set is created rather than overwriting the original. If you prefer to replace the original, remove the underscore from the final DATA Statement.

 

however,it did not works. the error is :

 

Line generated by the macro variable "FFF30".
1 format ITEM6092 $2E3.;
                                   ---
31
-
22
200
ERROR 31-185: Format 2E3 is unknown.

ERROR 22-322: Syntax error, expecting one of the following: a name, -, ;, DEFAULT, _ALL_,
_CHARACTER_, _CHAR_, _NUMERIC_.

ERROR 200-322: The symbol is not recognized and will be ignored.

NOTE: Line generated by the macro variable "FFF50".
1 format ITEM18370 $4E3.;
                                     ---
31
-
22
200

ERROR 31-185: Format 4E3 is unknown.

ERROR 22-322: Syntax error, expecting one of the following: a name, -, ;, DEFAULT, _ALL_,
_CHARACTER_, _CHAR_, _NUMERIC_.

ERROR 200-322: The symbol is not recognized and will be ignored.

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WRDS_ may be incomplete. When this step was stopped there were 0
observations and 67 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.01 seconds

 

i have no idea about this.

 

thank you!!


Accepted Solutions
Solution
‎02-14-2017 04:15 AM
Super User
Posts: 6,927

Re: change the length of all character variales in a dataset


JNWong wrote:

sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other  is change the length of all character variables,find the maximum length for each character variables and used to define the length  .as for the numeric variables, the length would be retained.

 

after that i will get a new dataset.

 

 


Download the macro code from http://support.sas.com/kb/24/addl/fusion24804_1_squeeze.zip. Unpack the file (squeeze.txt), store it on your SAS machine and %include it.

Then just use

%squeeze(dsnin=your_existing_dataset,dsnout=your_new_dataset)
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 6,927

Re: change the length of all character variales in a dataset

Take a look at the %SQUEEZE macro.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 9,662

Re: change the length of all character variales in a dataset

It is easy for SQL.

 

data class;
 set sashelp.class;
run;

data _null_;
 set sashelp.vcolumn(where=(libname='WORK' and memname='CLASS' and type='char')) end=last;
 if _n_=1 then call execute('proc sql;alter table class modify');
 call execute(catt(name,' char(200)'));
 if not last then call execute(',');
  else call execute(';quit;');
run;
Contributor
Posts: 44

Re: change the length of all character variales in a dataset

sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other  is change the length of all character variables,find the maximum length for each character variables and used to define the length  .as for the numeric variables, the length would be retained.

 

after that i will get a new dataset.

 

 

Solution
‎02-14-2017 04:15 AM
Super User
Posts: 6,927

Re: change the length of all character variales in a dataset


JNWong wrote:

sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other  is change the length of all character variables,find the maximum length for each character variables and used to define the length  .as for the numeric variables, the length would be retained.

 

after that i will get a new dataset.

 

 


Download the macro code from http://support.sas.com/kb/24/addl/fusion24804_1_squeeze.zip. Unpack the file (squeeze.txt), store it on your SAS machine and %include it.

Then just use

%squeeze(dsnin=your_existing_dataset,dsnout=your_new_dataset)
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 9,662

Re: change the length of all character variales in a dataset

Sure. But that is another question.

Super User
Posts: 6,927

Re: change the length of all character variales in a dataset

To get to the root of the problem: why is the length of character variables 8000 "by default"? The SAS default length for character variables is 8, and the default used by some character functions is 200, so your "default" comes from somewhere else.

So you either have faulty documentation, or some DBA people fucked up, or you have a definition that makes sense (as there may be character variables designed to store free text), but where the bad effects (storage, I/O consumption) can be mitigated by using the compress=yes option on the dataset.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 44

Re: change the length of all character variales in a dataset

thanks,i do not know the specific reson,as it comes from a database. maybe made by otherone. i remembered that the lyrics can be 32579 for characters. 

Super User
Posts: 6,927

Re: change the length of all character variales in a dataset


JNWong wrote:

thanks,i do not know the specific reson,as it comes from a database. maybe made by otherone. i remembered that the lyrics can be 32579 for characters. 


Well, if the database actually had such definitions, it is to be assumed that there is a reason for that. In such cases I keep the defined lengths from the DB and use compress=yes to reduce physical dataset sizes. I also use the tagsort option when sorting such datasets (to keep the utility file small).

The %squeeze macro will automatically set the correct needed length for your variables, if you want to make a change on the SAS side.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 266 views
  • 0 likes
  • 3 in conversation