BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JNWong
Calcite | Level 5

hi all,

 

 i have a dataset which consist of character variables and numeric variables, as for the character variables, their length are defined as 8000 by default,so i want to shrink the dataset,change the length of character variable.

 

 as i should find all the character variable firstly.

 

i have tried the code:

%macro change(dsn);                                         
 
data _null_;                                                
  set &dsn;                                                 
  array qqq(*) _character_;                                 
  call symput('siz',put(dim(qqq),5.-L));                    
  stop;                                                     
run;                                                        
                                                            
data _null_;                                                
  set &dsn end=done;                                        
  array qqq(&siz) _character_;                              
  array www(&siz.);                                         
  if _n_=1 then do i= 1 to dim(www);                        
    www(i)=0;                                               
  end;                                                      
  do i = 1 to &siz.;                                        
    www(i)=max(www(i),length(qqq(i)));                      
  end;                                                      
  retain _all_;                                             
  if done then do;                                          
    do i = 1 to &siz.;                                      
      length vvv $50;                                       
      vvv=catx(' ','length',vname(qqq(i)),'$',www(i),';');  
      fff=catx(' ','format ',vname(qqq(i))||' '||           
          compress('$'||put(www(i),3.)||'.;'),' ');         
      call symput('lll'||put(i,3.-L),vvv) ;                 
      call symput('fff'||put(i,3.-L),fff) ;                 
    end;                                                    
  end;                                                      
run;                                                        
                                                            
data &dsn._;                                                
  %do i = 1 %to &siz.;                                      
    &&lll&i                                                 
    &&fff&i                                                 
  %end;                                                     
  set &dsn;                                                 
run;                                                        
                                                            
%mend;  

just from the sas help,and it illustrations is:

 

The first DATA Step counts the number of character variables in the data set and puts it into a macro variable. The next step loops through all of the data looking for the maximum number of characters in each variable. When all of the rows have been checked, a LENGTH statement and FORMAT statement are generated for each variable and saved in macro variables. In the final DATA Step the macro loop pushes the LENGTH and FORMAT statements out to the DATA Step ahead of the SET statement so that the length and format are changed.

NOTE that a new data set is created rather than overwriting the original. If you prefer to replace the original, remove the underscore from the final DATA Statement.

 

however,it did not works. the error is :

 

Line generated by the macro variable "FFF30".
1 format ITEM6092 $2E3.;
                                   ---
31
-
22
200
ERROR 31-185: Format 2E3 is unknown.

ERROR 22-322: Syntax error, expecting one of the following: a name, -, ;, DEFAULT, _ALL_,
_CHARACTER_, _CHAR_, _NUMERIC_.

ERROR 200-322: The symbol is not recognized and will be ignored.

NOTE: Line generated by the macro variable "FFF50".
1 format ITEM18370 $4E3.;
                                     ---
31
-
22
200

ERROR 31-185: Format 4E3 is unknown.

ERROR 22-322: Syntax error, expecting one of the following: a name, -, ;, DEFAULT, _ALL_,
_CHARACTER_, _CHAR_, _NUMERIC_.

ERROR 200-322: The symbol is not recognized and will be ignored.

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WRDS_ may be incomplete. When this step was stopped there were 0
observations and 67 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.01 seconds

 

i have no idea about this.

 

thank you!!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@JNWong wrote:

sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other  is change the length of all character variables,find the maximum length for each character variables and used to define the length  .as for the numeric variables, the length would be retained.

 

after that i will get a new dataset.

 

 


Download the macro code from http://support.sas.com/kb/24/addl/fusion24804_1_squeeze.zip. Unpack the file (squeeze.txt), store it on your SAS machine and %include it.

Then just use

%squeeze(dsnin=your_existing_dataset,dsnout=your_new_dataset)

View solution in original post

8 REPLIES 8
Ksharp
Super User

It is easy for SQL.

 

data class;
 set sashelp.class;
run;

data _null_;
 set sashelp.vcolumn(where=(libname='WORK' and memname='CLASS' and type='char')) end=last;
 if _n_=1 then call execute('proc sql;alter table class modify');
 call execute(catt(name,' char(200)'));
 if not last then call execute(',');
  else call execute(';quit;');
run;
JNWong
Calcite | Level 5

sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other  is change the length of all character variables,find the maximum length for each character variables and used to define the length  .as for the numeric variables, the length would be retained.

 

after that i will get a new dataset.

 

 

Kurt_Bremser
Super User

@JNWong wrote:

sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other  is change the length of all character variables,find the maximum length for each character variables and used to define the length  .as for the numeric variables, the length would be retained.

 

after that i will get a new dataset.

 

 


Download the macro code from http://support.sas.com/kb/24/addl/fusion24804_1_squeeze.zip. Unpack the file (squeeze.txt), store it on your SAS machine and %include it.

Then just use

%squeeze(dsnin=your_existing_dataset,dsnout=your_new_dataset)
Ksharp
Super User
Sure. But that is another question.

Kurt_Bremser
Super User

To get to the root of the problem: why is the length of character variables 8000 "by default"? The SAS default length for character variables is 8, and the default used by some character functions is 200, so your "default" comes from somewhere else.

So you either have faulty documentation, or some DBA people fucked up, or you have a definition that makes sense (as there may be character variables designed to store free text), but where the bad effects (storage, I/O consumption) can be mitigated by using the compress=yes option on the dataset.

JNWong
Calcite | Level 5

thanks,i do not know the specific reson,as it comes from a database. maybe made by otherone. i remembered that the lyrics can be 32579 for characters. 

Kurt_Bremser
Super User

@JNWong wrote:

thanks,i do not know the specific reson,as it comes from a database. maybe made by otherone. i remembered that the lyrics can be 32579 for characters. 


Well, if the database actually had such definitions, it is to be assumed that there is a reason for that. In such cases I keep the defined lengths from the DB and use compress=yes to reduce physical dataset sizes. I also use the tagsort option when sorting such datasets (to keep the utility file small).

The %squeeze macro will automatically set the correct needed length for your variables, if you want to make a change on the SAS side.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 5346 views
  • 1 like
  • 3 in conversation