Welcome back to my series Fixing Broken Hearts and Macros! In the first post, we explored macro timing and quoting functions. This week, it’s time for a character variable length check. Just like checking in on someone’s character can reveal their true strengths, checking the length of character variables can show whether they’re using more space than they need. To help with that, I’ll introduce two macros: one that examines all character columns in a data set and reports their defined length alongside the longest stored value, and another that uses that information to adjust lengths where possible. A simple check like this can make your data sets smaller and your programs more efficient.
Running a character variable check helps ensure your data sets are as efficient as possible: smaller data sets process faster and use less memory. The length of character variables in SAS are measured in bytes, which determines how many characters can be stored for each individual value. Most standard letters and digits account for 1 byte each, however some international characters may require 2 or more bytes. For example, if the length of a column is 5 bytes, there can be at most 5 standard letters, digits, and spaces in each data value. When a character column’s defined length is larger than the actual string, SAS pads the value with blanks to fill the allocated length.
It may be helpful to identify columns where the defined length is longer than the maximum string, and reduce the length to account only for the longest possible value. This can reduce the overall file size and increase processing speed, especially for large data sets. Before making changes, it is important to consider whether character columns will be updated in the future. If they will, consider leaving these columns with the current length or define a reduced length with a reasonable number of extra characters for future values. Because of this, I broke the solution into two macros: %CharCheck and %CharResize. Run %CharCheck to generate a table that compares each column length with the maximum string length. Then run %CharResize to reduce each column lengths to the maximum string length. Below is an in-depth explanation of both macros. To download and view the full macros with only short comments throughout, see the full code on my github.
The full code for %charCheck can be found here.
The %charCheck macro creates a table of all character variables in a specified data set and lists the longest value in that variable, along with the variables defined length.
Below is sample output if the macro calls the sashelp.company data set:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
%charCheck accepts 3 parameters:
%macro charCheck(lib=work, data=, out=char_lengths);
…
%mend charCheck;
Inside the macro, local macro variables are defined. Local macro variables are defined inside the macro so the variables only exist while the macro is running. Defining local macro variables also prevents SAS from looking for the macro variables in the global symbol table- if they already existed there, the defined value would be used causing conflict.
%local dsid nvars i varname vartype varlen rc;
The %LET statement creates the macro variable dsid which stands for “data set ID”. %SYSFUNC is a macro function used to execute a normal SAS function. The OPEN function opens the data set with character values in input mode, meaning values can be read, not modified, and returns a value for data set ID. The value is a unique data set identifier if the data set is successfully opened, and 0 if the data set could not be opened successfully.
%let dsid = %sysfunc(open(&lib..&data,i));
For example, the following opens sashelp.company successfully. The value of dsid is 1.
%let dsid = %sysfunc(open(sashelp.company, i));
%put &=dsid;
Example code outside of the macros can be found here.
The following DO block checks that the value of dsid is not missing or 0, meaning the data set opened successfully. If it opened successfully, the DO block executes.
%if &dsid %then %do;
…
%end;
Inside the DO block, the %LET statement creates the macro variable nvars. Using the ATTRN function, the total number of variables in the data set is returned.
%let nvars = %sysfunc(attrn(&dsid,nvars));
For example, when sashelp.company is open, nvars returns a value of 8.
Then, a DATA step creates the output data set. Three variables are defined:
The STOP statement prevents SAS from writing a row of missing values, ensuring the data set begins as just the structure.
data &out;
length Variable $32. DefinedLength MaxLength 8.;
stop;
run;
The DO loop uses nvars to loop over all variables in the specified data set and creates the i macro variable. For example, if sashelp.company has 8 variables, the loop will execute 8 times, once for each variable.
%do i = 1 %to &nvars;
…
%end;
Inside the DO loop, the %LET statement creates vartype. This uses the VARTYPE function to determine whether the current variable is character (value of C) or numeric (value of N).
%let vartype = %sysfunc(vartype(&dsid,&i));
If the value of vartype is C, meaning the variable is character, the THEN DO block executes.
%if &vartype = C %then %do;
…
%end;
Inside the THEN DO block, varname is created to store the name of the current character variable using the VARNAME function. Varlen is created to store the defined length of the current character variable using the VARLEN function.
%let varname = %sysfunc(varname(&dsid,&i));
%let varlen = %sysfunc(varlen(&dsid,&i));
The following SQL procedure inserts one observation per iteration of the DO loop for the current character variable name, defined length and maximum length. These values are put into the columns Variable, DefinedLength, and MaxLength created in the earlier DATA step.
proc sql noprint;
insert into &out
select "&varname" as Variable,
&varlen as DefinedLength,
max(lengthn(&varname)) as MaxLength
from &lib..&data;
quit;
The final %LET statement uses the CLOSE function to close the data set and stores the return code in the rc macro variable. This verifies the data set was closed successfully, 0= success, nonzero= error.
%let rc = %sysfunc(close(&dsid));
%put Dataset close return code = &rc;
For example, if the data set is closed successfully, this is what the log looks like:
%let rc = %sysfunc(close(&dsid));
%put &=rc;
If the data set did not open successfully, the ELSE DO block executes and generates an error in the log.
%else %do;
%put ERROR: Dataset &lib..&data not found.;
%end;
To call the macro provide the desired values of lib, data and out.
%charCheck(lib=sashelp, data=company, out=char_lengths);
The table created can be used to analyze the character variables that can be reduced in length.
The full code for %charResize can be found here.
%charResize uses %charCheck to create a new data set with updated lengths if a character variables’ MaxLength is smaller than the DefinedLength.
%charResize accepts 4 parameters:
%macro charResize(lib=work, data=, outLib=work, out=);
…
%mend charResize;
Inside the macro, the %LOCAL statement defines local macro variables.
%local summary len_stmt;
The %LET statement creates a macro variable named summary that stores the value all_char_len. This will be the name of one of the data set created. Then, %charCheck is called using the values provided for lib and data in the %charResize macro call.
%let summary = all_char_len;
%charCheck(lib=&lib, data=&data, out=&summary);
The data set all_char_len created is identical to char_lengths produced from %charCheck.
The SQL procedure uses all_char_len to evaluate the columns where MaxLength is smaller than DefinedLength and puts these rows into the data set charvar_to_shrink.
proc sql noprint;
create table charvar_to_shrink as
select Variable, DefinedLength, MaxLength
from &summary
where MaxLength < DefinedLength and not missing(MaxLength);
quit;
The output for sashelp.company would appear as follows:
The next SQL procedure selects the values in the Variable column in charvar_to_shrink and concatenates them with a $ and the value in the MaxLength column. All of these are put into a macro variable named len_stmt, separated by spaces. This creates a length statement of all character columns that can have updated lengths.
proc sql noprint;
select catx(' ', Variable, cats('$', MaxLength))
into :len_stmt separated by " "
from charvar_to_shrink;
quit;
%put &=len_stmt;
For sashelp.company, len_stmt would appear in the log as follows:
The IF THEN statement is for error handling. In case no data set name was specified for the out parameter, the data set name will be the name of the original data set with _shrunk as a suffix.
%if %superq(out)= %then %let out=&data._shrunk;
Next, the DATA step creates a copy of the original data set with reduced character variable lengths. The output data set is named based on the values provided for the outLib and out parameters. The IF THEN DO statement inside evaluates whether the value of len_stmt is missing. If the value is missing, this would mean that no character columns need to be updated.
data &outLib..&out;
%if %superq(len_stmt)^= %then %do;
length &len_stmt;
%end;
set &lib..&data;
run;
To call the macro, provide the desired values of lib, data, outLib and out.
%charResize(lib=sashelp, data=company, outLib=work, out=company_new);
The final data set will vary. With sashelp.company as the specified data set, work.company_new is created as a copy with updated column lengths.
Partial output:
The updated column lengths can be viewed using PROC CONTENTS.
proc contents data=company_new;
run;
Just as it’s important to check in on a person’s character, it’s worthwhile to check in on your data set’s character… variables. Performing these character variable checks and resizing helps you identify where columns are using more space than they need and decide whether an update makes sense. By pairing the two macros, you can review defined lengths, adjust them when appropriate, and end up with a data set that’s both smaller and more efficient—without losing any integrity.
Find more articles from SAS Global Enablement and Learning here.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.