DATA Step, Macro, Functions and more

Difference between Character and Numerical Variables using Arrays

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 121
Accepted Solution

Difference between Character and Numerical Variables using Arrays

Hi,

 

The goal is to change the question answer numbers such that 1 - 5, 2 = 4, 4=2, and 5 =1.

When done with character variables, this works fine. But the same code does not work with numerical variables. With the numerical variables, the output of the array file looks the same as the original file.  Why doesn't this work the same?

---------------------

Here is the code with (Ques1-Ques5) being Character Variables:

Libname Learn'/folders/myfolders/Learn' ;

Data Survey1 ;
    Set Learn.Survey1 ;
    Array Ques(*) $ Q1-Q5 ;
    do i = 1 to dim(Ques) ;
        Ques{i} = translate(Ques{i},'54321','12345') ;
    end ;
    drop i;
run ;

-----------------

Here is a similar code, in which Ques1-Ques5 are numerical variables.

 

Libname Learn'/folders/myfolders/Learn' ;

Data Survey2 ;
    Set Learn.Survey2 ;
    Array QuesNum(*)  $ Q1-Q5 ;
    do i = 1 to dim(QuesNum) ;
        QuesNum{i} = translate(QuesNum{i},'54321','12345') ;
    end ;
    drop i;
run ;

 

Thanks!


Accepted Solutions
Solution
‎12-04-2017 09:15 PM
Super User
Super User
Posts: 7,929

Re: Difference between Character and Numerical Variables using Arrays

[ Edited ]
Posted in reply to ManitobaMoose

You cannot insert other steps (like a PROC FORMAT step) into the middle of a DATA step.

 

PROC FORMAT is only used to create user defined formats. If you merely wish to attach a format to a variable just use the FORMAT statement. 

 

 

The PUT() statement needs a format specification. The format type needs match the value/variable type.  You can tell that $8. is a character format since it starts with a $. So you can only apply that format to character values/variables. If your variable NumQues1 is a number then you need to use a numeric format and not a character format.  If your values are really just single digits then you can use the format 1. which can also be specified as the F1. format.

 

It kind of looks like you are reading in numeric variables, converting them to character strings, translating the character strings, and then converting them back to numeric values.

 

So here is code you could try.  The array OLD will have the names of the variables read from the dataset and the array NEW will have the names of the new variables that are to be created.  The STR array will have the one character strings used to convert.  It is not really needed since you could just call the PUT() function in the next line.

 

The PUT() function will convert the numbers to character strings. It uses the F1. numeric format.

The INPUT() function will convert the translated strings to numers. It uses the F1. numeric informat.

 

data Survey2 ;
  set Learn.Survey2 (Rename = 
(Ques1= NumQues1
 Ques2= NumQues2
 Ques3= NumQues3
 Ques4= NumQues4
 Ques5= NumQues5))
  ;
  array old (5) 8 NumQues1-NumQues5 ;
  array str (5) $1 Ques1-Ques5 ;
  array new (5) 8 QuesNum1-QuesNum5 ;
  do i = 1 to dim(old);
   str(i) = put(old(i),F1.) 
   new(i) = input( translate( str(i), '54321', '12345') , F1. ) ;
  end ;
  drop =  i NumQues1-NumQues5;
run ;

 

View solution in original post


All Replies
Super User
Posts: 9,860

Re: Difference between Character and Numerical Variables using Arrays

Posted in reply to ManitobaMoose

Translate() is a character function, and for it to work correctly with numerical variables, you have to make sure it works on correctly formatted strings. Do not rely on SAS's automatic conversion mechanics. I suggest you use put() and strip() for num to char, and input() for conversion back to num().

 

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Frequent Contributor
Posts: 121

Re: Difference between Character and Numerical Variables using Arrays

Posted in reply to KurtBremser

Great. I have tried that but am having problems, getting errors with converting numerical to characters using the put function.  Could you suggest how to do that?

 

I tried the following below, but it will not let me simply format to $8. It wants a specific format, such as Dollar10. etc, which is not appropriate here. I am still learning SAS, and am relatively new, so bare with me a bit. Thanks.

 

Data Survey2 ;
    Set Learn.Survey2 (Rename = (Ques1= NumQues1
                                Ques2= NumQues2
                                Ques3= NumQues3
                                Ques4= NumQues4
                                Ques5= NumQues5)) ;
    Proc Format ;
        Value QuesFmt $ 8. ;  /*THIS GENERATES AN ERROR*/
    run ;
    Ques1 =  put(NumQues1, QuesFmt) ;
    Ques2 =  put(NumQues2, QuesFmt) ;
    Ques3 =  put(NumQues3, QuesFmt) ;
    Ques4 =  put(NumQues4, QuesFmt) ;
    Ques5 =  put(NumQues5, QuesFmt) ;
    Array QuesNum(5)  $ (Ques1-Ques5) ;
    do i = 1 to 5(QuesNum) ;
        QuesNum{i} = translate(QuesNum{i},'54321','12345') ;
    end ;
    drop =  i NumQues1 NumQues2 NumQues3 NumQues4 NumQues5;
run ;

Solution
‎12-04-2017 09:15 PM
Super User
Super User
Posts: 7,929

Re: Difference between Character and Numerical Variables using Arrays

[ Edited ]
Posted in reply to ManitobaMoose

You cannot insert other steps (like a PROC FORMAT step) into the middle of a DATA step.

 

PROC FORMAT is only used to create user defined formats. If you merely wish to attach a format to a variable just use the FORMAT statement. 

 

 

The PUT() statement needs a format specification. The format type needs match the value/variable type.  You can tell that $8. is a character format since it starts with a $. So you can only apply that format to character values/variables. If your variable NumQues1 is a number then you need to use a numeric format and not a character format.  If your values are really just single digits then you can use the format 1. which can also be specified as the F1. format.

 

It kind of looks like you are reading in numeric variables, converting them to character strings, translating the character strings, and then converting them back to numeric values.

 

So here is code you could try.  The array OLD will have the names of the variables read from the dataset and the array NEW will have the names of the new variables that are to be created.  The STR array will have the one character strings used to convert.  It is not really needed since you could just call the PUT() function in the next line.

 

The PUT() function will convert the numbers to character strings. It uses the F1. numeric format.

The INPUT() function will convert the translated strings to numers. It uses the F1. numeric informat.

 

data Survey2 ;
  set Learn.Survey2 (Rename = 
(Ques1= NumQues1
 Ques2= NumQues2
 Ques3= NumQues3
 Ques4= NumQues4
 Ques5= NumQues5))
  ;
  array old (5) 8 NumQues1-NumQues5 ;
  array str (5) $1 Ques1-Ques5 ;
  array new (5) 8 QuesNum1-QuesNum5 ;
  do i = 1 to dim(old);
   str(i) = put(old(i),F1.) 
   new(i) = input( translate( str(i), '54321', '12345') , F1. ) ;
  end ;
  drop =  i NumQues1-NumQues5;
run ;

 

Frequent Contributor
Posts: 121

Re: Difference between Character and Numerical Variables using Arrays

Thanks for everypne's help!

 

I finally settled on the code below, which works.

-------------------------

Libname Learn'/folders/myfolders/Learn' ;

Data Survey2 ;
    Set Learn.Survey2 (Rename = (Ques1= NumQues1
                                Ques2= NumQues2
                                Ques3= NumQues3
                                Ques4= NumQues4
                                Ques5= NumQues5)) ;
    Ques1 =  put(NumQues1, $8.) ;
    Ques2 =  put(NumQues2, $8.) ;
    Ques3 =  put(NumQues3, $8.) ;
    Ques4 =  put(NumQues4, $8.) ;
    Ques5 =  put(NumQues5, $8.) ;
    Array QuesNum{5} Ques1-Ques5 ; /* DO NOT PUT FORMAT OF QUES1=QUES5 AFTER ARRAY STATEMENT*/
    do i = 1 to 5 ;
        QuesNum{i} = translate(QuesNum{i},'54321','12345') ;
    end ;
    FinalNumQues1 = input(Ques1, 8.) ;
    FinalNumQues2 = input(Ques2, 8.) ;
    FinalNumQues3 = input(Ques3, 8.) ;
    FinalNumQues4 = input(Ques4, 8.) ;
    FinalNumQues5 = input(Ques5, 8.) ;
    
    drop i NumQues1 NumQues2 NumQues3 NumQues4 NumQues5;
run ;

Super User
Posts: 9,860

Re: Difference between Character and Numerical Variables using Arrays

Posted in reply to ManitobaMoose

You have not followed my advice to use the functions I named. Just expand your use of translate():

QuesNum{i} = input(translate(strip(put(QuesNum{i},best.)),'54321','12345'),best.);

The translate() function now gets a string without blanks (the put() will create leading blanks), and the result will be converted back to numeric.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Frequent Contributor
Posts: 121

Re: Difference between Character and Numerical Variables using Arrays

Posted in reply to KurtBremser

Thanks. That is useful. I will try that.

Super User
Posts: 6,622

Re: Difference between Character and Numerical Variables using Arrays

Posted in reply to ManitobaMoose

It's complicated, using a character function on numeric variables.  Just take the simple route to make this an easy problem:

 

data want;

set have;

array q {5};

do i=1 to 5;

   select (q{i});

      when (1) q{i}=5;

      when (2) q{i}=4;

      when (4) q{i}=2;

      when (5) q{i}=1;

      otherwise;

   end;

end;

drop i;

run;

Super User
Super User
Posts: 7,929

Re: Difference between Character and Numerical Variables using Arrays

Posted in reply to ManitobaMoose

Your TRANSLATE() trick works well for single character codes.

For numeric codes you can just use subtraction. That will work even for codes with more than 10 values.

data have ;
  input q1-q5 ;
cards;
1 2 3 4 5
3 4 3 2 1
. 4 3 2 1
;

proc print data=have;
run;

data want ;
  set have ;
  array q (5) ;
  do i=1 to dim(q);
    q(i)=6-q(i);
  end;
  drop i;
run;
proc print data=want;
run;
Obs    q1    q2    q3    q4    q5

 1      1     2     3     4     5
 2      3     4     3     2     1
 3      .     4     3     2     1

Esteemed Advisor
Posts: 5,474

Re: Difference between Character and Numerical Variables using Arrays

Posted in reply to ManitobaMoose

So many ways to do the same thing. I would tend to choose something flexible as :

 

data have ;
  input q1-q5 ;
cards;
1 2 3 4 5
3 4 3 2 1
. 4 3 2 1
;

data want;
array nq{0:5} _temporary_ (. 5 4 3 2 1);
array q q1-q5;
set have;
do i = 1 to dim(q);
    q{i} = nq{coalesce(q{i},0)};
    end;
drop i;
run;

proc print noobs; run;
PG
Super User
Posts: 13,283

Re: Difference between Character and Numerical Variables using Arrays

Posted in reply to ManitobaMoose

Personally if I need to recode variables I prefer to recode question variables into analysis variables as new variables not replacing the old. Just in case there is an error or unexpected occurrence because of the values recorded in the original variable I can then look at an entire record and see where the complication occurred.

 

For most things I would start with formats if the variable is only to be used as a categorical analysis variable and only worry about recoding to numeric if I need to treat the variable as interval or ratio level. Formats are a quick way to change grouping without adding variables for categorical analysis. Frequently you may want to collapse some questions from 5 to 4,3 or even two categories for based on sample size and analysis. Running a procedure with a different format for the categories is much more efficient in terms of time, code and data maintenance than adding separate sets of variables to the analysis with different groupings.

 

Another option is reading the data with a custom informat to begin with if it was determine the original question really should have been coded as you indicate.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 445 views
  • 3 likes
  • 6 in conversation