Solved: Re: Macro to dynamically change the format of each variable to the max... - Page 2

TomKari · Posted 08-25-2020 02:42 PM

Correct. I use this technique when I'm pulling in some data from "somewhere", and I don't know how long the character variables are. This way, they're accommodated no matter what, and then I restructure the dataset to reduce the sizes. You can also apply the option to a library, in which case all the datasets that are created in the library will be compressed.

Tom

lawatkey · Posted 08-25-2020 02:45 PM

Correct as in, this compress option won't "stick" when I rejoin it?
I have table HAVE_1 that is millions of rows long.
There's also HAVE_2 that is much smaller, but has the inefficient lengths and formats.
My end result is a

create table WANT
select
a.var1, b.*
from HAVE_1 A
left join HAVE_2 B
on A.x = B.x;

Will the compress=char on HAVE_2 make it so that WANT will have those compressed variables from HAVE_2, or is the compression on HAVE_2 meaningless if my goal is to join it right after?

Tom · Posted 08-25-2020 02:50 PM

Just set the system option COMPRESS. That will change the default compression used when you don't explicitly set it with the dataset option COMPRESS=. You can also set a default on the LIBNAME statement.

Tom · Posted 08-25-2020 03:06 PM

If the goal is to just convert all of the character variables to minimum length needed to store the values present then something like this will work. You can use a data step view for the first step to avoid storing the tall dataset, and keep only the variables you need, do not use LENGTHN() function as zero is an invalid length for a variable. You can then query that view to get the max length for each variable and use it to generate one macro variable you could use in a LENGTH statement to define the variables. Then make a new dataset from the old one by placing the LENGTH statement before the SET statement so that the variables' lengths are defined before the compiler sees how they are defined in the source dataset. Add a format statement to remove any formats that might have been attached to the variables.

data _length_check / view=_length_check;
  set have;
  array chars _character_;
  length __varname $32 __length 8;
  do over chars;
    __varname=vname(chars);
    __length=length(chars);
    output;
  end;
  keep __varname __length;
run;

proc sql noprint;
  select catx(' ',nliteral(__varname),cats('$',max(__length)))
    into :_lengths separated by ' '
  from _length_check
  group by __varname 
  ;
quit;

data want ;
  length &_lengths ;
  set have;
  format _character_ ;
run;

This will have trouble if you have a lot of variables, as there is a 64K byte limit to the length of a macro variable.

It will also change the position of the variables in the new dataset. All of the character variables will now appear before any numeric variable. Also the relative order of the character variables will be based on how the SQL query generated them. That will be potentially random however in practice PROC SQL will order them alphabetically because of the group by clause.

If you want the variables positions preserved you will need a more complicated process.

lawatkey · Posted 08-25-2020 03:30 PM

Now this worked for me perfectly! This is exactly what I was trying to do, thank you so much!! And thank you everyone who posted here, whether they were viable solutions for my needs or not, I learned a lot about macros, formatting, and compression 🙂

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

Re: Macro to dynamically change the format of each variable to the maximum length found in that tabl

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away