Correct. I use this technique when I'm pulling in some data from "somewhere", and I don't know how long the character variables are. This way, they're accommodated no matter what, and then I restructure the dataset to reduce the sizes. You can also apply the option to a library, in which case all the datasets that are created in the library will be compressed.
Tom
Just set the system option COMPRESS. That will change the default compression used when you don't explicitly set it with the dataset option COMPRESS=. You can also set a default on the LIBNAME statement.
If the goal is to just convert all of the character variables to minimum length needed to store the values present then something like this will work. You can use a data step view for the first step to avoid storing the tall dataset, and keep only the variables you need, do not use LENGTHN() function as zero is an invalid length for a variable. You can then query that view to get the max length for each variable and use it to generate one macro variable you could use in a LENGTH statement to define the variables. Then make a new dataset from the old one by placing the LENGTH statement before the SET statement so that the variables' lengths are defined before the compiler sees how they are defined in the source dataset. Add a format statement to remove any formats that might have been attached to the variables.
data _length_check / view=_length_check;
set have;
array chars _character_;
length __varname $32 __length 8;
do over chars;
__varname=vname(chars);
__length=length(chars);
output;
end;
keep __varname __length;
run;
proc sql noprint;
select catx(' ',nliteral(__varname),cats('$',max(__length)))
into :_lengths separated by ' '
from _length_check
group by __varname
;
quit;
data want ;
length &_lengths ;
set have;
format _character_ ;
run;
This will have trouble if you have a lot of variables, as there is a 64K byte limit to the length of a macro variable.
It will also change the position of the variables in the new dataset. All of the character variables will now appear before any numeric variable. Also the relative order of the character variables will be based on how the SQL query generated them. That will be potentially random however in practice PROC SQL will order them alphabetically because of the group by clause.
If you want the variables positions preserved you will need a more complicated process.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.