In SAS, column length is critical. SAS numeric columns have a default length of 8 bytes, allowing the storage of up to 16 digits. In contrast, SAS character columns can vary from 1 to 32,767 bytes, with 1 byte typically equating to 1 character. *Note: This applies to single-byte encoding. If using UTF-8 encoding, some characters may require up to 3 bytes. This variability is common with non-English characters.
While numeric columns generally don't need length adjustments, character columns often do to prevent truncation of values. The usual approach to altering column length involves the LENGTH statement in the DATA step. However, this method can disrupt the order of your columns. Why does this happen, and how can we fix it while maintaining the original column order? The issue stems from the Program Data Vector (PDV), but don't worry—we can resolve it using PROC SQL.
What is the PDV?
The PDV (Program Data Vector) is a memory area that includes each column referenced in the DATA step along with its attributes, such as name, type, and length. The PDV is created during the compilation phase of DATA step processing, which I like to call the "column" phase. During this phase, SAS scans your DATA step and assigns columns and their attributes. SAS also establishes "rules" for the PDV based on the statements and options used, like the WHERE statement (which determines which rows to read) or the DROP statement (which excludes specific columns after processing). Columns and their attributes are brought into the PDV in the order they appear in the DATA step code. When a SET statement is encountered, columns from the input table are added to the PDV in the same order.
Jump to PDV Example
Adjusting Column Length
Let's consider an example using a subset of the SASHELP.CARS table. Below is the code used to create the WORK.SH_CARS table and snapshots of the table and column attributes:
data work.sh_cars;
set sashelp.cars;
drop DriveTrain--Length;
where Make in ("GMC" "Lexus");
run;
proc contents data=work.sh_cars varnum;
run;
*Note: By default, the CONTENTS procedure lists variables alphabetically. VARNUM prints a list of the variable names in the order of their logical position in the table.
Suppose you want to write out the value "SUV" in the Type column as "Sport Utility Vehicle". Currently, Type has a length of 8 characters. What happens when you use an IF/THEN statement to update this value?
data work.sh_cars_truncated;
set work.sh_cars;
if Type="SUV" then Type="Sport Utility Vehicle";
run;
You'll notice the value is truncated after 8 characters, resulting in "Sport Ut". This occurs because the column length remains 8.
Your first thought might be to add a LENGTH statement to correct this. However, placing the LENGTH statement after the SET statement will not prevent truncation and will produce a warning in the log:
data work.sh_cars_afterSet;
set work.sh_cars;
length Type $ 25;
if Type="SUV" then Type="Sport Utility Vehicle";
run;
This happens because SAS first sees the SET statement, which brings the columns and their attributes into the PDV. Since Type is initially assigned with a length of 8, the subsequent LENGTH statement can't alter it. SAS suggests moving the LENGTH statement before the SET statement:
data work.sh_cars_beforeSet;
length Type $ 25;
set work.sh_cars;
if Type="SUV" then Type="Sport Utility Vehicle";
run;
proc contents data=work.sh_cars_beforeSet varnum;
run;
Now, Type correctly has a length of 25 characters, but it appears first in the table, disrupting the original column order.
The Big Question: How Can We Maintain Column Order While Changing Column Length?
To maintain column order, we'll use PROC SQL dictionary tables and macro variables.
Dictionary tables are special read-only PROC SQL tables or views that provide information about all SAS libraries, tables, system options, and external files associated with the current SAS session.
In our example, we will explore the DICTIONARY.COLUMNS table, which includes details like column names, types, lengths, and formats for all tables known to the SAS session.
The DESCRIBE TABLE clause displays column names and labels in the SAS log. The SELECT clause will select all columns from DICTIONARY.COLUMNS. We'll limit our table to include only column information for the WORK.SH_CARS table using the WHERE clause. The LIBNAME and MEMNAME values are case-sensitive and must be written in all capital letters. The following code retrieves and displays the column labels from the DICTIONARY.COLUMNS table:
proc sql;
describe table dictionary.columns;
select *
from dictionary.columns
where libname="WORK" and memname="SH_CARS";
quit;
I've included partial results because we're particularly interested in the Name column. By selecting only Name, we obtain a list of column names in the original order:
proc sql;
select name
from dictionary.columns
where libname="SASHELP" and memname="CLASS";
quit;
Using the INTO clause, we can store the values in the Name column in a macro variable colNames, separating them with a single space. The %PUT statement prints the macro variable text in the log:
proc sql;
select name
into :colNames separated by " "
from dictionary.columns
where libname="WORK" and memname="SH_CARS";
quit;
%put &colNames;
Now, we have a macro variable containing our column names in the correct order. In the DATA step, we can use the RETAIN statement with the macro variable, ensuring the column names remain in order. SAS will then process the LENGTH statement to assign the type as character and length as 25 bytes for the Type column. Then SAS will bring in the rest of the column attributes from the input table on the SET statement.
data work.sh_cars_orderPreserved;
retain &colNames.;
length Type $ 25;
set work.sh_cars;
if Type="SUV" then Type="Sport Utility Vehicle";
run;
proc contents data=work.sh_cars_orderPreserved varnum;
run;
This approach allows us to change the length of a column while maintaining the original order of columns.
Example: The PDV
To visualize the PDV, we’ll use the SASHELP.FISH table, which has the following columns and column attributes:
Next, we’ll modify the table using the following DATA step:
data work.fishLength;
set sashelp.fish;
where Species = "Parkki";
drop Length1--Width;
AvgLength=round(mean(of Length1-Length3), .1);
run;
When this DATA step is executed, SAS creates the PDV for the WORK.FISHLENGTH table. All columns and attributes from the input table specified in the SET statement are brought into the PDV. Additionally, SAS adds the AvgLength column that we're creating. The column type and length for AvgLength are determined based on the numeric columns used in its calculation, which default to a length of 8.
SAS also processes the WHERE and DROP statements as rules. The WHERE statement ensures that only rows where Species is equal to "Parkki" are included, and after all data manipulations, the DROP statement removes the columns from Length1 to Width. The following is a mock-up of the PDV, showing column attributes and rules with their initialized values set to missing:
After the DATA step is executed, we will be left with the following table:
Jump to Adjusting Column Length
Find more articles from SAS Global Enablement and Learning here.
... View more