I have a dataset with about 500,000 records in it and every one of my character variables has a length of 255. I would like to shorten them without truncating any values. I can make a guess, but I don't want to do that. I haven't been able to figure out a way to see what the longest value is. Let's say the longest piece of text only uses 25 of the 255 spaces available. How do I figure that out?
This is a static dataset so once I find what the true length of the longest value for a variable is, I do not need to worry about any future values going any longer than that.
Any help would be most appreciated.
proc sql noprint;
select max(length(variable)) into :varlength from have;
quit;
data want;
length variable $&varlength.;
set have;
run;
quick and crude, but should work.
You want a method that looks at all the character variables without having to name them all and you want the keep the varnum order of the original data. My example uses sashelp.heart which has only one character variable that needs to be re-sized.
You'll know it's working when you see warnings like this.
WARNING: Multiple lengths were specified for the variable DeathCause by input data set(s). This can cause truncation of data
%let data=sashelp.heart;
data size(keep=_name_ _length_);
set &data end=_eof;
array _c[*] _character_;
array _s[10] _temporary_;
do _i_ = 1 to dim(_c);
_s[_i_] = max(_s[_i_],length(_c[_i_]));
end;
if _eof then do _i_ = 1 to dim(_c);
length _name_ $32;
_name_ = vname(_c[_i_]);
_length_=_s[_i_];
output;
end;
run;
proc print;
run;
filename FT23F001 temp;
options missing=' ';
data _null_;
file FT23F001;
if 0 then set &data;
if _n_ eq 1 then do;
put 'Retain ' (_all_) (=) ';' @;
_file_ = translate(_file_,' ','=');
put;
end;
set size;
put 'Length ' _name_ '$' _length_ ';';
run;
data resized;
%include FT23F001 / source2;
set &data;
run;
proc contents varnum;
run;
proc contents data=&data varnum;
run;
You have invalid names in the RETAIN statement, caused by the use of name literals.
Get rid of those in the dataset first, rename the variables to valid V7 SAS names (no slashes or brackets).
Modified to support VALIDVARNAME=ANY.
options validvarname=any;
%let data=sashelp.shoes(rename=(product='Product type'n Returns='Returns to Store'n));
data size(keep=_name_ _length_);
set &data end=_eof;
array _c[*] _character_;
array _s[10] _temporary_;
do _i_ = 1 to dim(_c);
_s[_i_] = max(_s[_i_],length(_c[_i_]));
end;
if _eof then do _i_ = 1 to dim(_c);
length _name_ $64;
_name_ = nliteral(vname(_c[_i_]));
_length_=_s[_i_];
output;
end;
run;
proc print;
run;
filename FT23F001 temp;
options missing=' ';
data _null_;
file FT23F001;
if 0 then set &data;
if _n_ eq 1 then link putretain;
set size;
put 'Length ' _name_ '$' _length_ ';';
return;
putretain:
/* Preserve VARNUM order in the new dataset. */
length _VNAME_ $64;
do while(1);
call vnext(_vname_);
if upcase(_vname_) eq '_NAME_' then leave;
_vname_ = nliteral(_Vname_);
put 'Retain ' _Vname_ +(-1) ';';
end;
return;
run;
data resized;
%include FT23F001 / source2;
set &data;
run;
proc contents varnum;
run;
proc contents data=&data varnum;
run;
@lihongamerica wrote:
Hi Jade,
I run this code to some data, it works. But for another data there is some error:
29 %let data=sas2012.elig14;
30 %let resized=sas2012.resize_elig14;
Do you now how to solve it?
Hi Jade,
There is still some problem, This in my email, lihongamerica@gmail.com
Can I give you a phone call, or personal email your code is so fallacious, I hope you an explain some detail to me
> Can I give you a phone call, or personal email your code is so fallacious, I hope you an explain some detail to me
1. This is not at all how this works. This community is not here to give you free tech support. It is here to help you learn, and help others learn by having access to the discussions.
2. Fallacious is rather derogatory, I hope that's not what you meant.
3. This is not specified in the discussion, but in case someone wonders, the codes provided are also valid for multi-byte-encoded strings.
@ChrisNZ wrote:
> Can I give you a phone call, or personal email your code is so fallacious, I hope you an explain some detail to me
1. This is not at all how this works. This community is not here to give you free tech support. It is here to help you learn, and help others learn by having access to the discussions.
2. Fallacious is rather derogatory, I hope that's not what you meant.
3. This is not specified in the discussion, but in case someone wonders, the codes provided are also valid for multi-byte-encoded strings.
@ChrisNZ I'm sure @lihongamerica meant "Felicitous". I suspect autocorrect the culprit.
D _NULL_:
I like the fact that your code keeps all the variables in their original positions.
I'd offer only one meaningful change to it. My intention is to
For large datasets, this can save a lot of work. In the sashelp.shoes dataset, instead of traversing all 395 obs, this stops at obs 109, because every character var has met the current storage length by then. My code additions are in UPPERCASE, all in the DATA SIZE step:
data size(keep=_name_ _length_);
set &data end=_eof;
array _c[*] _character_;
array _s[10] _temporary_;
ARRAY STORED_LENGTH[10] _TEMPORARY_;
IF _N_=1 THEN DO I=1 TO DIM(_C);
STORED_LENGTH{I}=VLENGTH(_C{I});
END;
do _i_ = 1 to dim(_c);
IF _S[_I_]=STORED_LENGTH{_I_} THEN CONTINUE; /*Already at max? Skip this variable*/
_s[_i_] = max(_s[_i_],length(_c[_i_]));
IF _S[_I_}=STORED_LENGTH{_I_} THEN N_AT_MAX_LENGTH+1;
end;
IF N_AT_MAX_LENGTH<DIM(_C) AND _EOF=0 THEN RETURN;
/*if _eof then */
do _i_ = 1 to dim(_c);
length _name_ $64;
_name_ = nliteral(vname(_c[_i_]));
_length_=_s[_i_];
output;
end;
STOP;
run;
While we optimise, we might as well ascertain the number of character variables rather than
array _s[10] _temporary_;
Maybe something like
data _null_;
if 0 then set &data.;
length __NM __TP $4;
do while(1);
call vnext(__NM, __TP);
if upcase(__NM) eq '__NM' then leave;
__NB+(__TP='C');
end;
call symput('nb_char_var',__NB);
run;
A terser way.
data _null_;
if 0 then set &data.;
array C[*] _character_;
call symputx('nb_char_var',dim(C));
stop;
run;
Any better way?
[ Edited. Silly me!
Also, this method generates a warning if no character variables are found, so it's not very clean.]
@ChrisNZ wrote:
A terser way.
data _null_; if 0 then set &data.; array C[*] _character_; call symputx('nb_char_var',dim(C)); stop; run;
Any better way?
[ Edited. Silly me!
Also, this method generates a warning if no character variables are found, so it's not very clean.]
When I use this technique I add a dummy character variable.
17 %let data=sashelp.class(keep=_numeric_);
18 data _null_;
19 if 0 then set &data.;
20 length _dummy_ $1;
21 array C[*] _character_;
22 call symputx('nb_char_var',dim(C)-1);
23 stop;
24 run;
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
25 %put NOTE: &=nb_char_var;
NOTE: NB_CHAR_VAR=0
Was inspired to write a quick macro that allows just using the correct number on the go:
data T; array A[ %varcount(SASHELP.CLASS,C) ]; run;
/*********************************************************************************************** Macro name VARCOUNT ---------- Description This macro retrieves the number of variables in a table. ----------- Optionally, only numeric or character variables can be counted. Run %VarCount(help) for a complete description of the parameters and for examples. Parameters See in Help section below ---------- Inputs None ------ Outputs None ------- Generates SAS code No (SAS code in dosubl routine) ------------------ Calls other macros No ------------------ Example See in Help section below ------- Limitations 1- Basic parameter validation ----------- Author Christian Graffeuille ------ Changes V1. Initial version ------- ***********************************************************************************************/ %macro varcount ( table , vartype , options ) / minoperator ; %*** INIT *********************************; %local rc dsid random msgtype; %let vartype=%upcase(%sysfunc(compress(%superq(vartype),,ak))); %let options=%upcase(%sysfunc(compress(%superq(options),,ak))); %if %index(&options.,E%str()RROR ) %then %let msgtype=E%str()RROR ; %else %if %index(&options.,N%str()OTE ) %then %let msgtype=N%str()OTE ; %else %if %index(&options.,I%str()NFO ) %then %let msgtype=I%str()NFO ; %else %let msgtype=W%str()ARNING; %if ^%length(%superq(table)) %then %do; %let table=HELP; %end; %else %do; %let dsid = %sysfunc(open(%superq(table))); %if ^&dsid. %then %do; %put &msgtype: Could not open table %superq(table).; %let table=HELP; %end; %else %let rc=%sysfunc(close(&dsid)); %end; %if %length(&vartype) %then %let vartype=%substr(&vartype,1,1); %else %let vartype=A; %if %length(&vartype) %then %if ^( &vartype. in A C N ) %then %do; %put &msgtype: Variable type is invalid and must be one of: <blank> A C N. Subsequent characters are ignored.; %let table=HELP; %end; %*** HELP SCREEN *********************************; %if %qupcase(%superq(table))=HELP %then %do; %let rc=%sysfunc(dosubl(%nrstr( %macro _; %mend _; options ps=max ls=132 nonotes nosymbolgen nomlogic; data _null_; LINE=repeat('#',99); putlog '00'x; putlog LINE; putlog '# ______________________________ #'; putlog '# Help screen for macro VarCount #'; putlog '# ------------------------------ /\"_"/\ #'; putlog "# ( ='.'= ) #"; putlog '# This macro retrieves the number of variables in a data set. (") "" (") #'; putlog '# |" "| #'; putlog '# Data set options are allowed. /"/ \"\ #'; putlog '# (")"||"(") #'; putlog '# (( #'; putlog '# )) #'; putlog '# (( #'; putlog '# Parameters #'; putlog '# ========== #'; putlog '# table REQD Data set name. Data set options are accepted. #'; putlog '# #'; putlog '# vartype OPTL Filter on type of variable to count. #'; putlog '# Valid value. Only the first letter is used. One of: #'; putlog '# A Count all variable types (default if no value is provided) #'; putlog '# C Count character variables only #'; putlog '# N Count numeric variables only #'; putlog '# #'; putlog '# options OPTL Additional options. #'; putlog '# Valid value(s), one or more of: #'; /* putlog '# VERBOSE Prints a comment in the log when a table #';*/ /* putlog '# or a variable is not found. #';*/ /* putlog '# DEBUG Prints more technical information in the log. #';*/ putlog '# INFO NOTE WARNING ERROR #'; putlog '# Type of message in the log if something goes wrong #'; putlog '# such as table not found or unexpected parameter value. #'; putlog '# Default: WARNING #'; putlog '# #'; putlog '# Examples #'; putlog '# ======== #'; putlog '# #'; putlog '# %VarCount(help) Display this help screen #'; putlog '# #'; putlog '# %put => %VarCount(SASHELP.CLASS); => 5 #'; putlog '# #'; putlog '# %put => %VarCount(SASHELP.CLASS(drop=AGE),N); => 2 #'; putlog '# #'; putlog LINE; putlog '00'x; run; ))); %return; %end; %*** MAIN *********************************; %let random=%scan(%sysfunc(datetime()),1)%scan(%sysfunc(ranuni(0)),2); %local __nb_var&random ; %let rc=%sysfunc(dosubl(%nrstr( data _null_; if 0 then set &table.; length _DUMMYC&random._ $1 _DUMMYN&random._ 8; array _DUMMYC&random.[*] _character_; array _DUMMYN&random.[*] _numeric_; call symputx("__nb_var&random", ("&vartype" in ('A','C')) * (dim(_DUMMYC&random.)-1) + ("&vartype" in ('A','N')) * (dim(_DUMMYN&random.)-1) ); stop; run; ))); &&__nb_var&random. %mend varcount; /* %put => %varcount(help ); %put => %varcount(SASHELP.CLASS(drop=AGE), N ); %put => %varcount(SASHELP.CLASS(drop=AGE), C ); %put => %varcount(SASHELP.CLASS(drop=_ALL_) ); data T; array A[%varcount(SASHELP.CLASS,C)]; run; %put => %varcount( ); %put => %varcount(1X , X, ERROR ); */ /*** END OF FILE ***/
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.