BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NicolasH
Calcite | Level 5

Hi SAS Community,

I'm often working with datasets, where many of the variables are stored as text. For example, I'll have columns like "Gender" or various Yes/No questions that are stored as strings (e.g., Male/Female, Yes/No).

For my analysis, I'd like to convert these text variables into numeric format (e.g., 1 = Male, 2 = Female; 1 = Yes, 0 = No) and apply corresponding labels for better readability during analysis.

 

In the beginning I used plain if statements, but recently I found a better way using formats:

data mydata;
input id $ foo $ bar $;
datalines;
a Yes No
d No Typo
f No Yes
;
run;

proc format;
    value $parse_yes_no
    'Yes' = '1'
    'No' = '0'
    other = '2';
    
    value yes_no
    1 = 'Yes'
    0 = 'No'
    2 = 'Other';
run;

data mydata_numeric;
    set mydata;
    foo_num = input(put(foo, $parse_yes_no.), 8.);
    bar_num = input(put(bar, $parse_yes_no.), 8.);
    drop foo bar;
    rename foo_num=foo bar_num=bar;
run;

data mydata_labeled;
    set mydata_numeric;
    format foo yes_no. bar yes_no.;
run;

While this works, I’m sure there must be a more efficient way to do this, especially if there are a lot of columns to convert.

 

 

Is there a more efficient way to convert text fields to numeric values and apply labels, perhaps using arrays or macros to streamline this process for many variables?

 

What would be the best practice in such a scenario to maintain clean and readable code while ensuring flexibility for different datasets?

 

Any insights, ideas, or suggestions would be greatly appreciated!

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Placing a macro definition in the middle of some other step is just a recipe for confusion.

 

If you know the names of the variables in question put them into a macro variable.

%let varlist=foo bar;

You can then create a macro to generate the code you want.

%macro recode(vars);
%local i n var;
%let n=%sysfunc(countw(&vars,%str( )));
%do i=1 %to &n;
  %let var=%scan(&vars,&i,%str( ));
  _&var = input(put( &var , $parse_yes_no.),32.);
%end;
  drop &vars;
%do i=1 %to &n;
  %let var=%scan(&vars,&i,%str( ));
  rename _&var = &var ;
  format _&var yes_no. ;
%end;
%mend recode;

And then call that macro in a place where the generated SAS statements will work.

data mydata_numeric;
  set mydata;
  %recode(foo bar);
run;

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

How many variables are you talking about?  How many datasets? How many variables per dataset?

How many of the variables are using the same set of code/value pairs?  Your example is showing two variables both using the same pair.

How did you decide what codes to define for the values?

What part of the process are you trying to streamline?

What do you mean by streamlining?  Do you mean running faster? Or faster to type? Easier to read?

 

If you have multiple variables that all use the same code/value pairs then an array might help (but really only if you have more than 3 or 4 because otherwise wallpaper code is just as easy and less likely to have errors).

data mydata_numeric;
  set mydata;
  array _char_ foo bar;
  array _num_ foo_num bar_num;
  do index=1 to dim(_char_;
    _num_[index] = input(put(_char_[index], $parse_yes_no.), 32.);
  end;
  drop index foo bar;
  rename foo_num=foo bar_num=bar;
  format foo_num bar_num yes_no. ;
run;

If you are really interested in stream lining the conversion of a LOT of variables for the same dataset then it might be worth while to assign the character to digit string format(s) to the variables and dump them to a text file and then just read them back into numeric vairables. So something like this.  (Note you need to make sure to read the variables back in in the right order).

filename csv temp;
data _null_;
  set mydata ;
  format foo bar $parse_yes_no.;
  file csv dsd ;
  put (_all_) (+0);
run;
data mydata_numeric;
  length var1 8 var2 $10 foo bar 8 varlast $20;
  infile csv dsd ;
  input var1 -- varlist;
  format foo bar yes_no. ;
run;

 

NicolasH
Calcite | Level 5

Thanks for the fast and detailed reply. 

 

In my example I posted, I just have two columns for better readability here, but as mentioned, I'm doing this operation quite often (and btw. I believe doing it in SAS is better than doing it in Excel before importing).

 

In the case I've right now, I have 23 variables in one dataset, and they're all yes/no. I thought about the array approach you suggest already because it would even allow me to just use _character_, but I got stuck with rename and drop, which I wasn't able to put in the loop.

 

Personally, I think it's not only more cumbersome to write the rename statement with 23 variables, but I'm also afraid of making errors while doing so. 

 

Btw, in case it helps, I asked ChatGPT to build me the rename statement, this is what it came up with, but I couldn't make it work:

data new_dataset;
    set given;
    array char_vars _character_;
    /* Here would be code which creates a numeric copy of each char_vars variable with _num suffix */
    rename
        %macro rename_back();
            %do k = 1 %to dim(char_vars);
                %let old_varname = %sysfunc(varname(&char_vars[k]));
                %let new_varname = &old_varname._num;
                &new_varname = &old_varname
            %end;
        %mend rename_back;
        %rename_back;
run;
Tom
Super User Tom
Super User

Placing a macro definition in the middle of some other step is just a recipe for confusion.

 

If you know the names of the variables in question put them into a macro variable.

%let varlist=foo bar;

You can then create a macro to generate the code you want.

%macro recode(vars);
%local i n var;
%let n=%sysfunc(countw(&vars,%str( )));
%do i=1 %to &n;
  %let var=%scan(&vars,&i,%str( ));
  _&var = input(put( &var , $parse_yes_no.),32.);
%end;
  drop &vars;
%do i=1 %to &n;
  %let var=%scan(&vars,&i,%str( ));
  rename _&var = &var ;
  format _&var yes_no. ;
%end;
%mend recode;

And then call that macro in a place where the generated SAS statements will work.

data mydata_numeric;
  set mydata;
  %recode(foo bar);
run;
NicolasH
Calcite | Level 5

Amazing! Thank you, that was exactly the solution I dreamed of!

ballardw
Super User

One bit is to use INVALUE to create custom INFORMATS.

You may want to be a bit careful of OTHER though. What if the value is blank? Wouldn't missing make more sense?

Also where practical I try to use the custom informats when reading the data from an external source .

 

Here is minor example of some different behavior available:

Note, since the INVALUE is to create numeric values it does not use a name starting with $.

data mydata;
input id $ foo $ bar $;
datalines;
a Yes No
b yes no
c yES nO
d No Typo
f No Yes
g . no
;
run;

proc format;
    invalue parse_yes_no (upcase)
    'YES' = 1
    'NO' = 0
    ' ','.' = .
    other = _error_;
    
    value yes_no
    1 = 'Yes'
    0 = 'No'
    ;
run;

data mydata_numeric;
    set mydata;
    foo_num = input(foo, parse_yes_no.);
    bar_num = input(bar, parse_yes_no.);
run;

And an example of use with "external" data files or datalines:

data mydata2;
input id $ foo :parse_yes_no. bar :parse_yes_no.;
datalines;
a Yes No
b yes no
c yES nO
d No Typo
f No Yes
g . no
;

If you read the LOG you will see Invalid Data messages for the "other" values. The resulting variable with be missing but you get a note in the log telling you that there are unexpected values and what that value is. So you can take steps to resolve it, such as modifying the Informat (some one entered data in Spanish Non or SI for example where you could add those to the informat.

 

Do note the use the option UPCASE on the Invalue statement which is not available for a Value statement. That converts the text to all uppercase before comparing to the list of values. So you don't need to provide all 8 possible spellings of Yes if case is considered (think hand entered data by careless staff). 

 

If you have "many" sets of values involved and have the information in a data set you can use a CNTLIN option to use a specially structured data set to create the formats/informats.

Run this code:

Proc format library=work cntlout=work.myformats;
run;

This will create a data set in the work library named Myformats that contains the definitions of all the formats and informats in that special structure I mentioned.

Some things with the dataset approach: any "OTHER" used must be the last value in the data set. Also all values for a given FMTNAME must be together in the data set. If there is a gap then you in effect create two formats with the second one replacing the previous version.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 266 views
  • 2 likes
  • 3 in conversation