BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lawatkey
Obsidian | Level 7

So my code, which works up until the actual macro at the end, seems fine in theory, but I feel like SAS is not a fan of using the macro variables inside of the format options.  Below is my code, everything up until the macro works. (Pardon the casing, I write in caps, but the first part of code I found in lowercase).

 


data inter;
set have;
array chars _character_;
length __varname $32;
do over chars;
__varname=vname(chars);
__length=lengthn(chars);
output;
end;
run;

proc sql;
create table want as
select __varname as varname, max(__length) as max_length
from inter
group by varname
;
quit;

 

PROC SQL;
SELECT COUNT(DISTINCT VARNAME) INTO :VARCNT TRIMMED
FROM WORK.WANT;

 

PROC SQL;
SELECT COMPRESS(VARNAME) INTO :VRNM1 - :VRNM&VARCNT
FROM WORK.WANT;

 

PROC SQL;
SELECT MAX_LENGTH INTO :ML1 - :MX&VARCNT
FROM WORK.WANT;

 

 

%MACRO CLEAN;
%DO i = 1 %TO &VARCNT;
PROC SQL;
ALTER TABLE HAVE
MODIFY &&VRNM&i CHAR(&&ML&i.) FORMAT=$&ML&i.
QUIT;
%END;
%MEND;
%CLEAN;

 

I tried different variations of "&"s and "."s but could not find one that worked.  

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If the goal is to just convert all of the character variables to minimum length needed to store the values present then something like this will work.  You can use a data step view for the first step to avoid storing the tall dataset, and keep only the variables you need, do not use LENGTHN() function as zero is an invalid length for a variable.  You can then query that view to get the max length for each variable and use it to generate one macro variable you could use in a LENGTH statement to define the variables. Then make a new dataset from the old one by placing the LENGTH statement before the SET statement so that the variables' lengths are defined before the compiler sees how they are defined in the source dataset. Add a format statement to remove any formats that might have been attached to the variables.

data _length_check / view=_length_check;
  set have;
  array chars _character_;
  length __varname $32 __length 8;
  do over chars;
    __varname=vname(chars);
    __length=length(chars);
    output;
  end;
  keep __varname __length;
run;

proc sql noprint;
  select catx(' ',nliteral(__varname),cats('$',max(__length)))
    into :_lengths separated by ' '
  from _length_check
  group by __varname 
  ;
quit;

data want ;
  length &_lengths ;
  set have;
  format _character_ ;
run;

This will have trouble if you have a lot of variables, as there is a 64K byte limit to the length of a macro variable.

It will also change the position of the variables in the new dataset.  All of the character variables will now appear before any numeric variable. Also the relative order of the character variables will be based on how the SQL query generated them.  That will be potentially random however in practice PROC SQL will order them alphabetically because of the group by clause.

 

If you want the variables positions preserved you will need a more complicated process.

View solution in original post

19 REPLIES 19
Reeza
Super User
You need two periods at the end is one issue. Check what is getting resolved using the macro debugging options:

options mprint symbolgen;

Run the code again and post the log from your macro with the debugging options on.
ballardw
Super User

When you want a value involving macro variables to have a . in the resolved value you likely need 2.

 

ORMAT=$&ML&i..

 

Example:

%let v =sometext;

%put  one dot: &v. ;

%put  two dots: &v..;

 

A single dot is used to end a macro variable so that when you may a construct like &v.P, intending a value like "sometextP" the processor knows to stop resolving &v. If you use &vp then the processor expects a macro variable named VP.

 

Also, you may be doing more work than needed. You can reference all of the character variables, if that is truly what you want with

 

format _character_  $w. if you use Proc datasets

 

data work.class;
   set sashelp.class;
run;

proc datasets library=work;
   modify class;
      format _character_ $25. ;
run;
quit;

The data step is to create a temporary version of a data set you should have available and then use proc datasets to modify the format property.

 

So your macro would only have to determine the length that you need one time and not have to have a very inefficient loop of Proc Sql alter table stuff.

 

Proc datasets is one of the of the other procs that requires QUIT to end.

 

lawatkey
Obsidian | Level 7

MPRINT(CLEAN): PROC SQL;
WARNING: Apparent symbolic reference VRNM not resolved.
SYMBOLGEN: Macro variable I resolves to 1
NOTE: Line generated by the invoked macro "CLEAN".
36 &VRNM&i CHAR(&&ML&i) FORMAT=$&ML&i.. QUIT;
_
22
200
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 1
SYMBOLGEN: Macro variable ML1 resolves to 20
WARNING: Apparent symbolic reference ML not resolved.
SYMBOLGEN: Macro variable I resolves to 1
NOTE 137-205: Line generated by the invoked macro "CLEAN".
36 PROC SQL; ALTER TABLE HAVE MODIFY &VRNM&i CHAR(&&ML&i) FORMAT=$&ML&i.. QUIT;
_
2 The SAS System 08:17 Tuesday, August 25, 2020

22
ERROR 22-322: Expecting a name.

ERROR 200-322: The symbol is not recognized and will be ignored.

ERROR 22-322: Expecting a format name.

 

That is what shows up in the log.  I do like that mprint option though!

 

Also, for 

data work.class;
   set sashelp.class;
run;

proc datasets library=work;
   modify class;
      format _character_ $25. ;
run;
quit;

How would I amend this to make it dynamically adjust to the maximum character lengths in a table?

Reeza
Super User
A format doesn't change the length at all - it just changes how much of it shows. Do you want to change the length or the format as they are not the same.

Any chance you imported some data using PROC IMPORT and it is truncating values and this is your attempt to fix it? Changing a length or format after the fact only change how the data is displayed so isn't a useful endeavor unless something went wrong somewhere else.

ballardw
Super User

@lawatkey wrote:

 

Also, for 

data work.class;
   set sashelp.class;
run;

proc datasets library=work;
   modify class;
      format _character_ $25. ;
run;
quit;

How would I amend this to make it dynamically adjust to the maximum character lengths in a table?


Actually I misunderstood your requirement as setting all variables to the same format.

 

I might look at this though:

proc datasets library=work noprint;
   modify have;
   %do i=1 to &varcnt.;
      format &&VRNM&i $&ML&i..;
   %end;
run;
quit; 

assuming your &ml&i. values are correct.

I think a single call to proc datasets may be faster than multiple Proc SQL alter table calls.

Tom
Super User Tom
Super User

Your subject line is confusing.

Is your goal to change the variable's defined length?  Or just the format used to display it?

You can change the formats used to display the variable without recreating the whole dataset.

If you need to change the actual definition of the variables (and not just the information on how to display the values) then you need to make a new dataset.  You could re-use the same name (as long as you are ok with loosing the original version).

Also make sure to either remove existing $xx formats attached to the variables, or at least replace them with new formats that match the new lengths of the variable.  If you have a variable that can hold up to 20 characters but have attached the $10. format to it then only the first 10 will be displayed.

lawatkey
Obsidian | Level 7

Sorry for being unclear.  I often work with and am given tables that come with formats that are much longer than required.  My issue with that is when I join back to those tables with data sets that have millions and millions of rows, these overly large formats make the table way bigger than it needs to be.  I have found that doing that alter table step and modifying the CHAR() and FORMAT($.) allows the final joined table to be much smaller, which is important in the environment I work in.

Tom
Super User Tom
Super User

If you just want the display format width to match the variable's length then why not just REMOVE any existing formats?  Then there is no need to calculate the maximum length per variable.

proc datasets nolist lib=work;
  modify have ;
    format _character_ ;
  run;
quit;
lawatkey
Obsidian | Level 7
I tried that with the table, and in the properties section, the length seems unchanged, but the format for the characters is now blank. Doesn't length impact the size? Or is it just the format?
Tom
Super User Tom
Super User

You seem to me using the word FORMAT as if it implied the variable type.  SAS has just two variable types, floating point numbers and fixed length character strings. For character variable it is the LENGTH that determines how the variable is defined.  FORMATs in SAS are just instructions for how to convert the stored values into text.  They are totally independent from how the variable is actually defined.  

 

If you are getting dataset with variables whose defined length is much larger than the longest value you can just use the COMPRESS=YES setting and SAS will remove the trailing spaces when writing the values to the disk.  You don't really need to worry too much about lengths that are longer than the values.  (Unless you are talking about huge datasets).

 

If you are worried about combining datasets that have the same variables defined with different lengths then you should get the maximum length from all of the dataset to avoid the possibility of truncation.

 

But you do also need to worry about whether the character variables have formats attached.  When combining dataset with variables with different lengths SAS will use the length from the first dataset that has the variable.  And when combining dataset with the same variable with different formats attached SAS will use the first format it sees (ignoring any datasets where the variable does not have a format attached).   

 

So you could cause lose of data because the variable is defined shorter than the values read from the other dataset.  But having the wrong format attached could also cause values to appear to be truncated because of the format with the wrong width is attached to the variable.

TomKari
Onyx | Level 15

Here's an option you might find useful:

 

Tom

 

data have;
	set sashelp.cars;
run;

proc sql noprint;
	select max(length) into :maxlen from dictionary.columns where libname = "WORK" and memname = "HAVE" and type="char";
	select name into :varname separated by " " from dictionary.columns where libname = "WORK" and memname = "HAVE" and type="char";
quit;

%put &maxlen.;
%put &varname.;

data want;
	length &varname. $ &maxlen.;
	set have;
run;
lawatkey
Obsidian | Level 7
This is unfortunately not a solution for me at all, because if one variable is legitimately 1000 characters long, but the rest are 10-20, this seems to group all characters the same way. But I wonder if your code can be put into a macro to evaluate each character variable separately? Using your macro made the table about 2-3 times its original size
TomKari
Onyx | Level 15

If you use the COMPRESS option, you can set the length of all the character variables to 32767, and the option will remove blanks at the end. Works like this:

 

data want(compress=char);

 

Tom

lawatkey
Obsidian | Level 7
So the compress option does bring the table size down, but is this compression lost if I join this table? Since I still see the formats and lengths in the properties are unchanged.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 4119 views
  • 3 likes
  • 5 in conversation