BookmarkSubscribeRSS Feed
RichardAD
Quartz | Level 8

Consider the requirement to convert numeric variables to same named character variables, and the character values must be the formatted numeric values, and the original column order must be maintained.  More generally, conversion is rendering process.

 

My first cut is 

%macro num_to_char(data=) ;
  %local vars renames reassigns drops formats;

  proc contents noprint data=&data out=n2c_contents ; run ;
  proc sql noprint ;
    select trim(name)||'$'||cats(ifn(type=1,32,length))
    into :vars separated by ' ' from n2c_contents order by varnum ;
    select cats(name,'=_',varnum), cats(name,'=left(vvalue(_',varnum,'))'), cats('_',varnum)
    , cats('_',varnum)||' '||ifc(format='','best32.',cats(format,formatl,'.',formatd))
    into :renames separated by ' ', :reassigns separated by ';', :drops separated by ' '
    , :formats separated by ' '
    from n2c_contents where type = 1 ;
    
    drop table n2c_contents ;
  quit ;
  %if %length(&renames) %then %do ;
  data &data ;
    length &vars ;
    set &data (rename=(&renames)) ;
    format &formats ;
    &reassigns ;
    drop &drops ;
  run ;    
  %end ;
%mend num_to_char;

If this macro is for a general library, how much harder should it go?

The current implementation has all converted values being stored in a character variable of length $32 (call this the target/container/destination).

Which, if any, of these would be worth pursuing?

  • Make the target length match the numeric format length
    • Use format default length in dictionary.formats if not in data set metadata
  • Shortening the target length to only necessary length (max rendered length)
    • For example: binary numeric variables 0/1 do not need $32 when converted
  • With shortened target lengths want to avoid asteriks and truncated results
11 REPLIES 11
Tom
Super User Tom
Super User

Is there a question here?

 

A word of warning if you are going to use CATS() with with the FORMATL and FORMATD variables of PROC CONTENTS output you will probably want to make sure the values are not zero.

RichardAD
Quartz | Level 8

The question was looking for suggestions on how deep a rabbit hole a %num_to_char macro should go.

 

Thanks for pointing out formatL=0 case.  FormatL is zero when a variable has no format associated with it, and that is the only case where I would force the format best32.  When FormatL > 0 I would let conversion go via vvalue() using the variable's associated format.  FormatL > 32 would also be a good condition for logging a warning about possible truncation of the conversion.

 

I adjusted the macro use

    , ifc(formatL=0,cats('_',varnum)||' best32.','') /* codegen a conversion format only if variable does not have one */

A look at my dictionary.formats shows there are no formats that could have formatL=0 (i.e. 0 rows where minw=0)

Tom
Super User Tom
Super User

No.  FORMATL and FORMATD are zero when the length or the decimal places are not specified, whether or not a FORMAT was specified.  Try this simple example.

data class;
 set sashelp.class;
 format age best.;
run;

proc contents data=class out=contents noprint; run;
proc sql;
select varnum,name,format,formatl,formatd
from contents
order by varnum
;
quit;

Tom_0-1746456753877.png

So when the FORMATL is zero and the FORMAT is specified than the only way to know what length to use to define the variable is to know the DEFAULT length that the attached format uses.

 

You might be able to get SAS to figure that out for you by first referencing the variable as the result of an assignment statement that uses the PUT() function.

char_age=put(age,best.);

But for that type of construction it will be easier to use DICTIONARY.COLUMNS to retrieve the format information for the variable.  That is because its version of the variable FORMAT has the full format specification instead of just the name of the format that appears in the PROC CONTENTS generated dataset.

ballardw
Super User

First answer WHY?

The order of variables inside a data step has never been important to any solution I have written in 30+ years using SAS.

There have been a few times that it is convenient to have some variables in a different order but it never matched your "convert numeric to character- keep the same name- keep the same column order" requirement.

 

And next to a trivial exercise using two data steps. See many of the solutions involving "order of variables" questions.

 

And my other question is why is the value numeric if you need it character to begin with? This almost smells like a "fix" to make a proper data set (numeric values for numeric variables) look like a prior poor choices or accidentally read file that should have had numeric values but because of a read (or likely IMPORT problem) some columns came in as character. So this forces duplication of a prior error.

 

What if some of the numeric variables are occasionally missing? What should the character representation be?

Where do you get the existing lengths of the character variables to assign the proper length to the converted version? Length of character variables not matching is a very common problem when attempting to combine data sets. So are these supposed to match the lengths of any other character variables in other data sets?

 

I have a suspicion that date, time and or datetime values may not respond well to some of this.

Custom formats may be a serious problem as well, especially a format that that turns short numeric codes into longish text strings such as 1 = "Nationality Unknown incomplete history".

Short test confirms this:

Proc format ;
value code
1='Nationality unknown incomplete history';
run;

data junk;
   ncode=1;
   format ncode code.;
run;

 proc contents noprint data=work.junk out=n2c_contents ; run ;

Which does not have format=' ' and the formatl and formatd are both 0. So your reassign is going to attempt to assign 0.0 as a format which likely is going to fail. So you need something that verifies none of your variables  have such custom formats assigned.

 

The specific check for BEST32 also sort of points to a connection with Proc Import as typically the default format is BEST12 for variables created in data steps and many procedures. Best32 tends to appear with imported data.

 

RichardAD
Quartz | Level 8

The situation is indeed an IMPORT issue.  Specifically XLSX import wherein a column should be character for downstream (legacy) processing, and the column managed to be imported as numeric (some use cases are getting all values in a column being numeric seeming) due to upstream changes.  Maintaining column order is also for legacy output.

 

Thanks for pointing out concerns for long custom format renderings.

Tom
Super User Tom
Super User

@RichardAD wrote:

The situation is indeed an IMPORT issue.  Specifically XLSX import wherein a column should be character for downstream (legacy) processing, and the column managed to be imported as numeric (some use cases are getting all values in a column being numeric seeming) due to upstream changes.  Maintaining column order is also for legacy output.

 

Thanks for pointing out concerns for long custom format renderings.


For PROC IMPORT from XLSX you can usually force all variables to be character by using GETNAMES=NO statement (assuming the column headers are text).  You can easily use PROC TRANSPOSE to get the list of column headers from the first observations to help you generate RENAME statement or RENAME= option.

proc transpose data=have(obs=1) out=names ;
  var _all_;
run;
proc sql noprint ;
select catx('=',_name_,nliteral(col1)) into :renames separated by ' '
from names;
quit;
data want;
  set have(firstobs=2 rename=(&renames));
run;

 

You can also take advantage of one of the annoying "features" of PROC IMPORT to help with this.  If every copy of a field in a delimited text file is enclosed in quotes then PROC IMPORT assumes the variable should be character even if the values are all numbers.  So dump your file to a text file using the ~ modifier and re-import.

filename csv temp;
proc transpose data=have(obs=0) out=names;
  var _all_;
run;
data _null_;
  file csv dsd ;
  set names;
  put _name_ ~ @ ;
run;
data _null_;
  file csv dsd mod;
  set have;
  put (_all_) (~);
run;
proc import file=csv dbms=csv out=want replace;
run;
Quentin
Super User

I had an old sas ballot item (seems not to have been carried over to the SAS product suggestions) asking for the XLSX engine to support the DBSASTYPE option.  That would help with problems like this.

 

But if the goal is to make data fit a pre-determined template in order to fit into legacy processing, seems like instead of a dynamic %num_to_char, you might consider writing a macro that allows you to read a data dictionary (with variable types, lengths, position, etc) and then transforms a dataset to fit that dictionary.

mkeintz
PROC Star

I agree that much of the time order of the variables I produce in a data set are not important.  But there are two situations in which I do care about variable order:

 

  1. I want a quick onscreen view of the data.
  2. It is effective to use the double-dash syntax to declare a list of variables
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

@mkeintz wrote:

I agree that much of the time order of the variables I produce in a data set are not important.  But there are two situations in which I do care about variable order:

 

  1. I want a quick onscreen view of the data.
  2. It is effective to use the double-dash syntax to declare a list of variables

Exactly.    Also placing the sort keys first makes it easier to understand. It is extremely annoying when trying to examine a unfamiliar dataset to have the first screen filled with unimportant or empty variables.

Ksharp
Super User

The most convenient way is using VVALUE() function.

 

data have;
 set sashelp.heart;
run;





proc sql noprint;
select name into :vname separated by ' ' from dictionary.columns where libname='WORK' and memname='HAVE';
select cats(name,'=_',name) into :rename separated by ' ' from dictionary.columns where libname='WORK' and memname='HAVE' and type='num';
select name into :num_vname separated by ' ' from dictionary.columns where libname='WORK' and memname='HAVE' and type='num';
select cats('_',name) into :_num_vname separated by ' ' from dictionary.columns where libname='WORK' and memname='HAVE' and type='num';
quit;
data temp;
 set have(rename=(&rename.));
 array x{*} &_num_vname ;
 array y{*} $ 32 &num_vname ;
 do i=1 to dim(x);
   y{i}=strip(vvalue(x{i}));
 end;
 drop i &_num_vname;
 run;
 data want;
 retain &vname.;
 set temp;
 run;
data_null__
Jade | Level 19
 

@RichardAD wrote:

Consider the requirement to convert numeric variables to same named character variables, and the character values must be the formatted numeric values, and the original column order must be maintained.  More generally, conversion is rendering process.

Consider PROC TRANSPOSE. 

 

data heart;
   _obs_ + 1;
   set sashelp.heart;
   run;
proc contents varnum;
   run;
options missing=' ';
proc transpose prefix=_val data=heart out=flip(where=(_name_ ne '_obs_'));
   by _obs_;
   var _all_;
   run;
data flip;
   set flip;
   _val1 = left(_val1);
   run;
proc transpose data=flip out=flop(drop=_obs_ _name_);
   by _obs_;
   var _val1;
   id _name_;
   idlabel _label_;
   run;
proc contents varnum;
   run;

Capture.PNG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1954 views
  • 2 likes
  • 7 in conversation