BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TomKari
Onyx | Level 15

Correct. I use this technique when I'm pulling in some data from "somewhere", and I don't know how long the character variables are. This way, they're accommodated no matter what, and then I restructure the dataset to reduce the sizes. You can also apply the option to a library, in which case all the datasets that are created in the library will be compressed.

 

Tom

lawatkey
Obsidian | Level 7
Correct as in, this compress option won't "stick" when I rejoin it?
I have table HAVE_1 that is millions of rows long.
There's also HAVE_2 that is much smaller, but has the inefficient lengths and formats.
My end result is a

create table WANT
select
a.var1, b.*
from HAVE_1 A
left join HAVE_2 B
on A.x = B.x;

Will the compress=char on HAVE_2 make it so that WANT will have those compressed variables from HAVE_2, or is the compression on HAVE_2 meaningless if my goal is to join it right after?
Tom
Super User Tom
Super User

Just set the system option COMPRESS.  That will change the default compression used when you don't explicitly set it with the dataset option COMPRESS=.  You can also set a default on the LIBNAME statement.

Tom
Super User Tom
Super User

If the goal is to just convert all of the character variables to minimum length needed to store the values present then something like this will work.  You can use a data step view for the first step to avoid storing the tall dataset, and keep only the variables you need, do not use LENGTHN() function as zero is an invalid length for a variable.  You can then query that view to get the max length for each variable and use it to generate one macro variable you could use in a LENGTH statement to define the variables. Then make a new dataset from the old one by placing the LENGTH statement before the SET statement so that the variables' lengths are defined before the compiler sees how they are defined in the source dataset. Add a format statement to remove any formats that might have been attached to the variables.

data _length_check / view=_length_check;
  set have;
  array chars _character_;
  length __varname $32 __length 8;
  do over chars;
    __varname=vname(chars);
    __length=length(chars);
    output;
  end;
  keep __varname __length;
run;

proc sql noprint;
  select catx(' ',nliteral(__varname),cats('$',max(__length)))
    into :_lengths separated by ' '
  from _length_check
  group by __varname 
  ;
quit;

data want ;
  length &_lengths ;
  set have;
  format _character_ ;
run;

This will have trouble if you have a lot of variables, as there is a 64K byte limit to the length of a macro variable.

It will also change the position of the variables in the new dataset.  All of the character variables will now appear before any numeric variable. Also the relative order of the character variables will be based on how the SQL query generated them.  That will be potentially random however in practice PROC SQL will order them alphabetically because of the group by clause.

 

If you want the variables positions preserved you will need a more complicated process.

lawatkey
Obsidian | Level 7
Now this worked for me perfectly! This is exactly what I was trying to do, thank you so much!! And thank you everyone who posted here, whether they were viable solutions for my needs or not, I learned a lot about macros, formatting, and compression 🙂

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 4121 views
  • 3 likes
  • 5 in conversation