Hi!
I have a table called vtmp. All the character type variable has a length of 1000 (see tab procContents in the attached excel). The actualLength tab has the desired length. Is there a way I can change the length using data step? The ideal way is that SAS code should calculate the max length of all character type field and reset that field length to the max.
I am thinking of a pseudo code like this:
data vtmp_new;
length cap_batch_id 6
length claim_number 12
….
….
set vtmp;
run;
The above is when I have precalculated the lengths individually and provide the information. The ideal would be that SAS reads the table, calculates the max length, and sets it to that length.
Data vtmp_new;
Set vtmp;
If cap_batch_id is char type then length cap_batch_id max(length(cap_batch_id))
…
..
Run
I am reading data from big data and all fields defaults to 36000 unless I set it to 1000.
You are better off dealing with the problem at it's root: correct the import process and rerun it.
One somewhat ugly approach is to run the code with only the length statements. SAS numerics are limited to length 8. Ensure the character variables are defined appropriately.
Such as
data vtmp_new;
length cap_batch_id $ 6 ;
length claim_number $ 12;
run;
Run that code and you get a data set that has the characteristics you want but is empty.
Use Proc Append with the force option to add the extracted using the force option so the long variables get shoved into the shorter version.:
Proc append base=vtmp_new data=extractedfromdb force nowarn;
run;
Note that if you have lots of variables that need the same length the code looks like:
length var1 var2 var3 ... $ 1000; or whatever. If the variables have a similar name then you may be able to use lists.
Here is one program to look at a dataset and calculate the maximum length used for character variables and create an empty data set as above to use append with.
data work.class; set sashelp.class; run; proc sql; select name, cats('l',name) into : varnames separated by ' ', : lvars separated by ' ' from dictionary.columns where libname='WORK' and memname='CLASS' and type='char'; select name into : numvars separated by ' ' from dictionary.columns where libname='WORK' and memname='CLASS' and type='num'; quit; data temp; set sashelp.class; array var &varnames; array len &lvars; do i=1 to dim(var); len[i]=length(var[i]); end; run; proc summary data=temp; var &lvars; output out=length max=; run; data _null_; set length end=eof; array var &varnames; array len &lvars; length varn $ 32 str $ 200; if _n_=1 then do; Call execute('data newdatset;');/* put the name of the new dataset you want here*/ Call execute("Length &numvars 8;"); /* make sure the numeric variables are created*/ end; /* create one length statement for each character variable*/ do i=1 to dim(var); varn = vname(var[i]); str= catx(' ','Length',Varn,'$',len[i],';'); call execute(str); end; if eof then do; call execute('run;'); end; run;
You can use next code - just addapt the 3 macro variables at top of code:
%let lib = SASHELP; /* library name */
%let dsnin = CARS; /* input dataset name */
%let dsnout = test; /* output dataset name */
proc sql;
select name into: names separated by ' '
from sashelp.vcolumn
where libname="&lib" and memname = "&dsnin" and type = 'char';
quit;
data _NULL_;
names = strip("&names");
nvars = countw(names);
call symput('nvars',left(nvars));
do i=1 to nvars;
varx = scan(names,i);
call symput('v'||left(i),varx);
end;
run;
%put NAMES= %trim(&names) NVARS=&nvars;
options mprint;
%macro ex;
data _null_;
set &lib..&dsnin end=eof;
retain v1-v&nvars 0;
array namx $ _character_ ;
array maxl {&nvars} v1-v&nvars;
%do i=1 %to &nvars;
var = namx(&i);
maxl(&i) = max(maxl(&i), length(strip(&&v&i)));
%end;
if eof then do;
call execute("data &dsnout; length "); /* statement creates the output */
do i=1 to dim(maxl);
var=vname(namx(i));
len=maxl(i);
call execute(var||' $'||left(len)||' ');
end;
call execute("; set &lib..&dsnin; run;");
end;
run;
%mend ex;
%ex;
/* T007670 Optimize the lengths of SAS variables
Character code from Roland Rashleigh-Berry (RIP)
Numeric code from Rick Langston
HAVE
====
Up to 40 obs from class total obs=19
Variables in Creation Order
# Variable Type Len
1 NAME Char 100
2 SEX Char 400
3 AGE Num 8
4 HEIGHT Num 8
5 WEIGHT Num 8
Obs NAME SEX AGE HEIGHT WEIGHT
1 Alfred M 14 69 113
2 Alice F 13 57 84
3 Barbara F 13 65 98
4 Carol F 14 63 103
5 Henry M 14 64 103
6 James M 12 57 83
7 Jane F 12 60 85
8 Janet F 15 63 113
9 Jeffrey M 13 63 84
WANT
====
Variables in Creation Order
# Variable Type Len
1 NAME Char 7
2 SEX Char 1
3 AGE Num 3
4 HEIGHT Num 3
5 WEIGHT Num 3
WORKING CODE
============
* output dataset can be the same as input dataset;
%utl_optlen(inp=class,out=class_cmp);
* make some data;
data class;
length name $100 sex $400;
set sashelp.class;
weight=round(weight);
height=round(height);
run;quit;
FULL SOLUTION
============
/* set the length of all char variables in a sas dataset to longest observed value */
%macro utl_optlen(
inp= /* input dataset */
,out= /* output dataset */
,compress=no /* output compression */
)
/ des="Create a length and retain statement to optimize dataset and variable attributes";
%local retain num char dsid res nvars rc;
* Input exist and/or empty;
%let dsid = %sysfunc(open(&inp,is));
%let nvars = 0;
%if &dsid ne 0 %then %do;
%let nvars = %sysfunc(attrn(&dsid,NVARS));
%end;
%if &dsid ne 0 %then %do; %let rc=%sysfunc(close(&dsid)); %end;
* Chang Chung;
%put %sysfunc(ifc(%sysevalf(%superq(inp )=,boolean) ,ERROR: Please Provide an Input dataset ,));
%put %sysfunc(ifc(%sysevalf(%superq(out )=,boolean) ,ERROR: Please Provide an output dataset ,));
%put %sysfunc(ifc(%sysevalf(%superq(compress )=,boolean) ,ERROR: Please Provide compression ,));
%put %sysfunc(ifc(%sysevalf(%superq(dsid )=0,boolean),ERROR: %sysfunc(sysmsg()) ,));
%put %sysfunc(ifc(%sysevalf(%superq(nvars )=0,boolean),ERROR: Dataset &inp is empty ,));
%let res= %eval
(
%sysfunc(ifc(%sysevalf(%superq(inp )=,boolean),1,0))
+ %sysfunc(ifc(%sysevalf(%superq(out )=,boolean),1,0))
+ %sysfunc(ifc(%sysevalf(%superq(compress )=,boolean),1,0))
+ %sysfunc(ifc(%sysevalf(%superq(nvars )=0,boolean),1,0))
+ %sysfunc(ifc(%sysevalf(%superq(dsid )=0,boolean),1,0))
);
%if &res = 0 %then %do; * passed;
ods listing close;;
ods output position=__layout;
proc contents data=&inp position;
run;quit;
ods listing;
* build retain statement;
* use separated to eliminate leading blanks;
proc sql noprint;
select sum(type='Char'), sum(type='Num')
into :chr separated by ' ', :num separated by ' ' from __layout;
;quit;
data _null_;
set &inp end=dne;
%if &num ne 0 %then %do;
array num[&num] _numeric_ ;
array lennum[&num] _temporary_;
%end;
%if &chr ne 0 %then %do;
array chr[&chr] _character_;
array lenchr[&chr] _temporary_;
do __i=1 to dim(chr);
if lengthn(chr[__i]) > lenchr[__i] then lenchr[__i]=length(chr[__i]);
end;
%end;
* this could be made more elegant but for maintenence reasons I kept it this way;
* Rick Langston first proposed this;
%if &num ne 0 %then %do;
do i=1 to dim(num);
if num[i] ne trunc( num[i], 7 ) then len = 8 ; else
if num[i] ne trunc( num[i], 6 ) then len = 7 ; else
if num[i] ne trunc( num[i], 5 ) then len = 6 ; else
if num[i] ne trunc( num[i], 4 ) then len = 5 ; else
if num[i] ne trunc( num[i], 3 ) then len = 4 ; else len=3;
if len > lennum[i] then lennum[i]=len;
end;
%end;
* build the program to optimize attributes;
if dne then do;
call execute(
"data &out(
compress=&compress
label='Dataset &inp processed by utl_optlen, Variable lengths may have been shortened and compression changed')
;");
call execute( 'retain');
do until (dnepos);
set __layout(keep=variable) end=dnepos;
call execute( variable);
end;
call execute( ';length');
%if &chr ne 0 %then %do;
do __i=1 to dim(chr);
var=catx(' ',vname(chr[__i]),cats('$',put(lenchr[__i],6.)));
call execute( var);
end;
%end;
%if &num ne 0 %then %do;
do __i=1 to dim(num); * do not want I variable;
var=catx(' ',vname(num[__i]),put(lennum[__i],6.));
call execute( var);
end;
%end;
call execute( ";set &inp;run;quit;");
end;
%end; * end do some work;
run;quit;
%mend utl_optlen;
Unfortunately there is no way to do this without reading through the data twice - once to determine lengths, and once to apply them:
filename tmp temp;
data _null_;
set have end=eod;
array chrvar {*} $1000 _character_;
array len{1000} _temporary_;
do c=1 to dim(chrvar);
len{c}=max(len{c},length(chrvar{c}));
end;
file tmp;
if eod then do c=1 to dim(chrvar);
vnam=vname(chrvar{c});
put vnam '$' len{c};
end;
run;
options source2;
data want;
length %include tmp ;;
set have;
run;
Notes
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.