hi all,
i have a dataset which consist of character variables and numeric variables, as for the character variables, their length are defined as 8000 by default,so i want to shrink the dataset,change the length of character variable.
as i should find all the character variable firstly.
i have tried the code:
%macro change(dsn);
data _null_;
set &dsn;
array qqq(*) _character_;
call symput('siz',put(dim(qqq),5.-L));
stop;
run;
data _null_;
set &dsn end=done;
array qqq(&siz) _character_;
array www(&siz.);
if _n_=1 then do i= 1 to dim(www);
www(i)=0;
end;
do i = 1 to &siz.;
www(i)=max(www(i),length(qqq(i)));
end;
retain _all_;
if done then do;
do i = 1 to &siz.;
length vvv $50;
vvv=catx(' ','length',vname(qqq(i)),'$',www(i),';');
fff=catx(' ','format ',vname(qqq(i))||' '||
compress('$'||put(www(i),3.)||'.;'),' ');
call symput('lll'||put(i,3.-L),vvv) ;
call symput('fff'||put(i,3.-L),fff) ;
end;
end;
run;
data &dsn._;
%do i = 1 %to &siz.;
&&lll&i
&&fff&i
%end;
set &dsn;
run;
%mend;
just from the sas help,and it illustrations is:
The first DATA Step counts the number of character variables in the data set and puts it into a macro variable. The next step loops through all of the data looking for the maximum number of characters in each variable. When all of the rows have been checked, a LENGTH statement and FORMAT statement are generated for each variable and saved in macro variables. In the final DATA Step the macro loop pushes the LENGTH and FORMAT statements out to the DATA Step ahead of the SET statement so that the length and format are changed.
NOTE that a new data set is created rather than overwriting the original. If you prefer to replace the original, remove the underscore from the final DATA Statement.
however,it did not works. the error is :
Line generated by the macro variable "FFF30".
1 format ITEM6092 $2E3.;
---
31
-
22
200
ERROR 31-185: Format 2E3 is unknown.
ERROR 22-322: Syntax error, expecting one of the following: a name, -, ;, DEFAULT, _ALL_,
_CHARACTER_, _CHAR_, _NUMERIC_.
ERROR 200-322: The symbol is not recognized and will be ignored.
NOTE: Line generated by the macro variable "FFF50".
1 format ITEM18370 $4E3.;
---
31
-
22
200
ERROR 31-185: Format 4E3 is unknown.
ERROR 22-322: Syntax error, expecting one of the following: a name, -, ;, DEFAULT, _ALL_,
_CHARACTER_, _CHAR_, _NUMERIC_.
ERROR 200-322: The symbol is not recognized and will be ignored.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WRDS_ may be incomplete. When this step was stopped there were 0
observations and 67 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.01 seconds
i have no idea about this.
thank you!!
@JNWong wrote:
sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other is change the length of all character variables,find the maximum length for each character variables and used to define the length .as for the numeric variables, the length would be retained.
after that i will get a new dataset.
Download the macro code from http://support.sas.com/kb/24/addl/fusion24804_1_squeeze.zip. Unpack the file (squeeze.txt), store it on your SAS machine and %include it.
Then just use
%squeeze(dsnin=your_existing_dataset,dsnout=your_new_dataset)
It is easy for SQL.
data class;
set sashelp.class;
run;
data _null_;
set sashelp.vcolumn(where=(libname='WORK' and memname='CLASS' and type='char')) end=last;
if _n_=1 then call execute('proc sql;alter table class modify');
call execute(catt(name,' char(200)'));
if not last then call execute(',');
else call execute(';quit;');
run;
sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other is change the length of all character variables,find the maximum length for each character variables and used to define the length .as for the numeric variables, the length would be retained.
after that i will get a new dataset.
@JNWong wrote:
sorry for another question,can the code apply to the whole dataset? since i need two things ,one is get all character variables firstly. the other is change the length of all character variables,find the maximum length for each character variables and used to define the length .as for the numeric variables, the length would be retained.
after that i will get a new dataset.
Download the macro code from http://support.sas.com/kb/24/addl/fusion24804_1_squeeze.zip. Unpack the file (squeeze.txt), store it on your SAS machine and %include it.
Then just use
%squeeze(dsnin=your_existing_dataset,dsnout=your_new_dataset)
Sure. But that is another question.
To get to the root of the problem: why is the length of character variables 8000 "by default"? The SAS default length for character variables is 8, and the default used by some character functions is 200, so your "default" comes from somewhere else.
So you either have faulty documentation, or some DBA people fucked up, or you have a definition that makes sense (as there may be character variables designed to store free text), but where the bad effects (storage, I/O consumption) can be mitigated by using the compress=yes option on the dataset.
thanks,i do not know the specific reson,as it comes from a database. maybe made by otherone. i remembered that the lyrics can be 32579 for characters.
@JNWong wrote:
thanks,i do not know the specific reson,as it comes from a database. maybe made by otherone. i remembered that the lyrics can be 32579 for characters.
Well, if the database actually had such definitions, it is to be assumed that there is a reason for that. In such cases I keep the defined lengths from the DB and use compress=yes to reduce physical dataset sizes. I also use the tagsort option when sorting such datasets (to keep the utility file small).
The %squeeze macro will automatically set the correct needed length for your variables, if you want to make a change on the SAS side.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.