I am working with a survey from survey monkey and unfortunately the data was imported with the column values in words and complete sentences. So for each question the answers to the question come in separate columns for Instance:
q1. Have taken care of someone dying before?
a. yes
b. No
c. Maybe
yes, No and Maybe all come out as variables with columns of their own instead of just being values under a single variable called question 1.
So, I have been trying to combine the separate answers into a single column all under q1 by using 'if then ' statements but I have an issue when it comes to questions with answers that are a whole lengthy sentence. I try to combine them under a new variable and it seems to run but but I end up with an empty column.
My log file doesn't claim any major errors but just puts an exclamation mark next to the if then statements.
i'll appreciate any help I can get with this. I have attached the code, and a picture of the dataset table when I open it in table view in SAS and also my log file.
data two; set two;
if strongInf='Are a strong influence on my attitude toward death and dying' then ReligBlf='Strong Religious Influence';
if minorInf='Are a minor influence on my attitude toward death and dying' then ReligBlf='Minor Religious Influence';
if noInf='Do not influence my attitude toward death and dying' then ReligBlf='No Religious Influence';
run;
So in this instance the survey question was about whether religious beliefs influenced respondents attitude towards dying patients. The response options are the three variables above, which come in 3 separate columns. Instead of just one as answers under the variable religious beliefs. So I tried to recombine them into one under a new variable 'ReligBlf'.
So I try to create a new variable with the 3 columns combined but always end up with an empty new column. "Relig Blfs"
This is my log file
I'll appreciate any help.
Thanks.
Please post data in a usable form. Makes it easier to help you.
See if you can use this as a template 🙂
strongInf='Are a strong influence on my attitude toward death and dying' ; output;
strongInf = ''; minorInf='Are a minor influence on my attitude toward death and dying' ; output;
strongInf = ''; minorInf= ''; noInf='Do not influence my attitude toward death and dying'; output;
run;
data two(drop = idx);
set one;
array a strongInf minorInf noInf;
array inf{3} $ 25 _temporary_ ('Strong Religious Influence',
'Minor Religious Influence',
'No Religious Influence');
idx = whichc(coalescec(of a[*]), of a[*]);
ReligBlf = inf[idx];
run;
Please post the data you have as data step using datalines, so that we actually see what you are talking about. And always post log and code as text using the "Insert SAS Code" button. It is almost always a bad idea to overwrite the source dataset with a data step.
Please post data in a usable form. Makes it easier to help you.
See if you can use this as a template 🙂
strongInf='Are a strong influence on my attitude toward death and dying' ; output;
strongInf = ''; minorInf='Are a minor influence on my attitude toward death and dying' ; output;
strongInf = ''; minorInf= ''; noInf='Do not influence my attitude toward death and dying'; output;
run;
data two(drop = idx);
set one;
array a strongInf minorInf noInf;
array inf{3} $ 25 _temporary_ ('Strong Religious Influence',
'Minor Religious Influence',
'No Religious Influence');
idx = whichc(coalescec(of a[*]), of a[*]);
ReligBlf = inf[idx];
run;
First thing I will say is I feel your pain. I have worked with several Survey Monkey data sets and did not enjoy the process at all.
On way to do this is to move the "logic" into custom formats. Which can be very helpful if you have repeated values. I will show an example for one set of arbitrary values. The process is routine and tedious but has been easier than lots of If/then/else code.
data start; infile datalines dlm='|' missover dsd; informat val1 val2 val3 $50.; input val1 val2 val3; datalines; Text value first response|| |Text value for the second response| ||This is the third response value ; Proc format; value $question "Text value first response" = 'First response' "Text value for the second response" = 'Second response' "This is the third response value" = 'Third response' ; run; data example; set start; Questionval = put(coalescec(val1,val2,val3),$question.); run;
What the above program does:
First is to create a small example data set with output similar to a 3-level response question. The text is slightly different on purpose to make it just a bit closer to what you show.
The format creates one set character format that will display the value on the left as the text on the right of the equal sign. The Value says this is a format, the $ must be used for character values. You CANNOT use digits as the last characters in the name of the format. Which is sometimes unfortunate as often it helps to have a format with a name similar to the values worked with and the variable names might be Q1 for question 1.
The second data step uses the format. Sort of a trick is the Coalescec function. This returns the first value of any of the variables that is not listed. Which means with single response questions it gets the one actual response. Then we apply the format to create the desired text value.
If you have multiple questions that use the same response set like Yes, No, Refused you can see that code gets a lot quicker as you could have something like:
Q2value = put(coalescec(Q2_a,Q2_b,Q2_c),$ynr.); Q3value = put(coalescec(Q3_a,Q3_b,Q3_c),$ynr.); Q4value = put(coalescec(Q4_a,Q4_b,Q4_c),$ynr.);
Which reuses the logic in the format instead of if/then/else.
Caveat: This will not work with multiple response variables because it would result in the first one. But multiple response questions are another store for everything.
A hint that can reduce coding like this even more. Name your variables with a common Stem and the difference at the end. Instead of Stronginf MinorInf Noinf use InfStrong InfMinor InfNo. One minor reason is that sorts of the variable names will place them together. Surveys often have a built in structure for this: Q1, Q2, Q3 for question 1 , question 2 question 3. As in my repeated use code, it is easy to do a search and replace for "Q1" and replace with "Q2" to modify the syntax.
A big advantage of this sort of naming is that you can use Variable lists much easier. You can reference every variable in a data step whose name starts with Q1 by using Q1: in places that lists of variables are accepted. Or if you want to just the first 5 of the 10 variables in Q10, if they are named Q10_1 through Q10_10 you can use a list like Q10_1 - Q10_5.
Can reduce code typing to something like:
Q2value = put(coalescec(of Q2_:),$ynr.); Q3value = put(coalescec(of Q3_:),$ynr.); Q4value = put(coalescec(of Q4_:),$ynr.); Q10Value= put(coalescec(of Q10_1-Q10_5:),$otherformat.);
The OF in the function says a list is expected.
You will note that _ in position variable names is not in the QxValue name. That can avoid potential accidental inclusion by using Q2: <which would also find Q20 , Q21, Q22, ..., Q200>
Side notes.
There is a comparable COALESCE function (no ending C) for Numeric valued variables. So if you have numeric result like a "number of times" question you may not need the format and can get the numeric value directly. May. If you had a "more than 10" then how to handle that becomes an issue. But this may help if you have 1 to 5 or 1 to 10 scale questions depending on how the survey was programmed.
A second similar approach if you want actual ordinal numeric values from characters is to create an Informat with an INVALUE such as:
Proc format; invalue question "Text value first response" = 1 "Text value for the second response" = 2 "This is the third response value" = 3 ; run; data example; set start; Questionval = INput(coalescec(val1,val2,val3),question.); run;
Note that the informat name on the INVALUE statement does not have a $ because we are going to create numeric values and that the data step uses INPUT to convert the text of the Val1,2,3 to an ordinal number. An additional format can be used to display desired text as well.
This ordinal numeric approach would be one way to handle the numeric response with a "more than 10" choice by assigning the text "more than 10" to some numeric value such as 15 or whatever makes sense to you.
This approach doesn't require additional code to deal with all missing values. BUT you could have if want to have a value like "no response"
data start; infile datalines dlm='|' missover dsd; informat val1 val2 val3 $50.; input val1 val2 val3; datalines; Text value first response|| |Text value for the second response| ||This is the third response value || ; Proc format; value $question "Text value first response" = 'First response' "Text value for the second response" = 'Second response' "This is the third response value" = 'Third response' " " = 'No response' ; data example; set start; Questionval = put(coalescec(val1,val2,val3),$question.); run;
Lastly, you may want to check with the people accessing Survey Monkey about the data export options.
I haven't used it for several years but I think, if the questions are set up right and the correct export option is set you can get single responses for single answer multiple choice questions.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.