- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The goal is to change the question answer numbers such that 1 - 5, 2 = 4, 4=2, and 5 =1.
When done with character variables, this works fine. But the same code does not work with numerical variables. With the numerical variables, the output of the array file looks the same as the original file. Why doesn't this work the same?
---------------------
Here is the code with (Ques1-Ques5) being Character Variables:
Libname Learn'/folders/myfolders/Learn' ;
Data Survey1 ;
Set Learn.Survey1 ;
Array Ques(*) $ Q1-Q5 ;
do i = 1 to dim(Ques) ;
Ques{i} = translate(Ques{i},'54321','12345') ;
end ;
drop i;
run ;
-----------------
Here is a similar code, in which Ques1-Ques5 are numerical variables.
Libname Learn'/folders/myfolders/Learn' ;
Data Survey2 ;
Set Learn.Survey2 ;
Array QuesNum(*) $ Q1-Q5 ;
do i = 1 to dim(QuesNum) ;
QuesNum{i} = translate(QuesNum{i},'54321','12345') ;
end ;
drop i;
run ;
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You cannot insert other steps (like a PROC FORMAT step) into the middle of a DATA step.
PROC FORMAT is only used to create user defined formats. If you merely wish to attach a format to a variable just use the FORMAT statement.
The PUT() statement needs a format specification. The format type needs match the value/variable type. You can tell that $8. is a character format since it starts with a $. So you can only apply that format to character values/variables. If your variable NumQues1 is a number then you need to use a numeric format and not a character format. If your values are really just single digits then you can use the format 1. which can also be specified as the F1. format.
It kind of looks like you are reading in numeric variables, converting them to character strings, translating the character strings, and then converting them back to numeric values.
So here is code you could try. The array OLD will have the names of the variables read from the dataset and the array NEW will have the names of the new variables that are to be created. The STR array will have the one character strings used to convert. It is not really needed since you could just call the PUT() function in the next line.
The PUT() function will convert the numbers to character strings. It uses the F1. numeric format.
The INPUT() function will convert the translated strings to numers. It uses the F1. numeric informat.
data Survey2 ;
set Learn.Survey2 (Rename =
(Ques1= NumQues1
Ques2= NumQues2
Ques3= NumQues3
Ques4= NumQues4
Ques5= NumQues5))
;
array old (5) 8 NumQues1-NumQues5 ;
array str (5) $1 Ques1-Ques5 ;
array new (5) 8 QuesNum1-QuesNum5 ;
do i = 1 to dim(old);
str(i) = put(old(i),F1.)
new(i) = input( translate( str(i), '54321', '12345') , F1. ) ;
end ;
drop = i NumQues1-NumQues5;
run ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Translate() is a character function, and for it to work correctly with numerical variables, you have to make sure it works on correctly formatted strings. Do not rely on SAS's automatic conversion mechanics. I suggest you use put() and strip() for num to char, and input() for conversion back to num().
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Great. I have tried that but am having problems, getting errors with converting numerical to characters using the put function. Could you suggest how to do that?
I tried the following below, but it will not let me simply format to $8. It wants a specific format, such as Dollar10. etc, which is not appropriate here. I am still learning SAS, and am relatively new, so bare with me a bit. Thanks.
Data Survey2 ;
Set Learn.Survey2 (Rename = (Ques1= NumQues1
Ques2= NumQues2
Ques3= NumQues3
Ques4= NumQues4
Ques5= NumQues5)) ;
Proc Format ;
Value QuesFmt $ 8. ; /*THIS GENERATES AN ERROR*/
run ;
Ques1 = put(NumQues1, QuesFmt) ;
Ques2 = put(NumQues2, QuesFmt) ;
Ques3 = put(NumQues3, QuesFmt) ;
Ques4 = put(NumQues4, QuesFmt) ;
Ques5 = put(NumQues5, QuesFmt) ;
Array QuesNum(5) $ (Ques1-Ques5) ;
do i = 1 to 5(QuesNum) ;
QuesNum{i} = translate(QuesNum{i},'54321','12345') ;
end ;
drop = i NumQues1 NumQues2 NumQues3 NumQues4 NumQues5;
run ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You cannot insert other steps (like a PROC FORMAT step) into the middle of a DATA step.
PROC FORMAT is only used to create user defined formats. If you merely wish to attach a format to a variable just use the FORMAT statement.
The PUT() statement needs a format specification. The format type needs match the value/variable type. You can tell that $8. is a character format since it starts with a $. So you can only apply that format to character values/variables. If your variable NumQues1 is a number then you need to use a numeric format and not a character format. If your values are really just single digits then you can use the format 1. which can also be specified as the F1. format.
It kind of looks like you are reading in numeric variables, converting them to character strings, translating the character strings, and then converting them back to numeric values.
So here is code you could try. The array OLD will have the names of the variables read from the dataset and the array NEW will have the names of the new variables that are to be created. The STR array will have the one character strings used to convert. It is not really needed since you could just call the PUT() function in the next line.
The PUT() function will convert the numbers to character strings. It uses the F1. numeric format.
The INPUT() function will convert the translated strings to numers. It uses the F1. numeric informat.
data Survey2 ;
set Learn.Survey2 (Rename =
(Ques1= NumQues1
Ques2= NumQues2
Ques3= NumQues3
Ques4= NumQues4
Ques5= NumQues5))
;
array old (5) 8 NumQues1-NumQues5 ;
array str (5) $1 Ques1-Ques5 ;
array new (5) 8 QuesNum1-QuesNum5 ;
do i = 1 to dim(old);
str(i) = put(old(i),F1.)
new(i) = input( translate( str(i), '54321', '12345') , F1. ) ;
end ;
drop = i NumQues1-NumQues5;
run ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for everypne's help!
I finally settled on the code below, which works.
-------------------------
Libname Learn'/folders/myfolders/Learn' ;
Data Survey2 ;
Set Learn.Survey2 (Rename = (Ques1= NumQues1
Ques2= NumQues2
Ques3= NumQues3
Ques4= NumQues4
Ques5= NumQues5)) ;
Ques1 = put(NumQues1, $8.) ;
Ques2 = put(NumQues2, $8.) ;
Ques3 = put(NumQues3, $8.) ;
Ques4 = put(NumQues4, $8.) ;
Ques5 = put(NumQues5, $8.) ;
Array QuesNum{5} Ques1-Ques5 ; /* DO NOT PUT FORMAT OF QUES1=QUES5 AFTER ARRAY STATEMENT*/
do i = 1 to 5 ;
QuesNum{i} = translate(QuesNum{i},'54321','12345') ;
end ;
FinalNumQues1 = input(Ques1, 8.) ;
FinalNumQues2 = input(Ques2, 8.) ;
FinalNumQues3 = input(Ques3, 8.) ;
FinalNumQues4 = input(Ques4, 8.) ;
FinalNumQues5 = input(Ques5, 8.) ;
drop i NumQues1 NumQues2 NumQues3 NumQues4 NumQues5;
run ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have not followed my advice to use the functions I named. Just expand your use of translate():
QuesNum{i} = input(translate(strip(put(QuesNum{i},best.)),'54321','12345'),best.);
The translate() function now gets a string without blanks (the put() will create leading blanks), and the result will be converted back to numeric.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. That is useful. I will try that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's complicated, using a character function on numeric variables. Just take the simple route to make this an easy problem:
data want;
set have;
array q {5};
do i=1 to 5;
select (q{i});
when (1) q{i}=5;
when (2) q{i}=4;
when (4) q{i}=2;
when (5) q{i}=1;
otherwise;
end;
end;
drop i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your TRANSLATE() trick works well for single character codes.
For numeric codes you can just use subtraction. That will work even for codes with more than 10 values.
data have ;
input q1-q5 ;
cards;
1 2 3 4 5
3 4 3 2 1
. 4 3 2 1
;
proc print data=have;
run;
data want ;
set have ;
array q (5) ;
do i=1 to dim(q);
q(i)=6-q(i);
end;
drop i;
run;
proc print data=want;
run;
Obs q1 q2 q3 q4 q5 1 1 2 3 4 5 2 3 4 3 2 1 3 . 4 3 2 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So many ways to do the same thing. I would tend to choose something flexible as :
data have ;
input q1-q5 ;
cards;
1 2 3 4 5
3 4 3 2 1
. 4 3 2 1
;
data want;
array nq{0:5} _temporary_ (. 5 4 3 2 1);
array q q1-q5;
set have;
do i = 1 to dim(q);
q{i} = nq{coalesce(q{i},0)};
end;
drop i;
run;
proc print noobs; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Personally if I need to recode variables I prefer to recode question variables into analysis variables as new variables not replacing the old. Just in case there is an error or unexpected occurrence because of the values recorded in the original variable I can then look at an entire record and see where the complication occurred.
For most things I would start with formats if the variable is only to be used as a categorical analysis variable and only worry about recoding to numeric if I need to treat the variable as interval or ratio level. Formats are a quick way to change grouping without adding variables for categorical analysis. Frequently you may want to collapse some questions from 5 to 4,3 or even two categories for based on sample size and analysis. Running a procedure with a different format for the categories is much more efficient in terms of time, code and data maintenance than adding separate sets of variables to the analysis with different groupings.
Another option is reading the data with a custom informat to begin with if it was determine the original question really should have been coded as you indicate.