The survey presents many questions with multiple choice answers including "if Other, specify" as an option. Some of the answers provided by the survey respondents are duplicated. I've received 300+ other responses of which some are misspelled or use acronyms.
The SAS syntax code below shows the basic descriptive statistics I am collecting for the data and descriptions of each data step:
Title "Question 8";
Which of the following personnel work at your facility?
1- Infection control staff
2- manager
3- director
4- specialist
5- technician
6- Other, please specify:
The corresponding Variables are listed below:
ICP MAN DIR Spec Tech Oth OthSfy;
libname saslib "\\my library*; *creates a permanent SAS File. Creating the library allows us to refer to a file in a specific folder.*;
run;
data year2020;
set saslib.survey;
run;
/* List variable*/;
proc print data=year2020;
var ICP MAN DIR Spec Tech Oth OthSfy;
run;
/*Format the question responses*/
proc format;
value ICP MAN DIR Spec Tech Oth OthSf
1="yes" 2="no";
run;
/* Calculate summary statistics */;
proc means data= year2020;
var ICP MAN DIR Spec Tech Oth OthSf;
run;
/* Examine extreme values */ ;
proc univariate data= year2020;
var ICP MAN DIR Spec Tech Oth OthSf;
run;
/* List unique values and frequencies */
proc freq data= year2020;
tables ICP MAN DIR Spec Tech Oth OthSf;
run;
Example of some of the answers and "Other specify" survey responses:
administrative assistant
admin. assit.
admin. assit
manager assistant
man. assit.
etc.
How would you combine the responses for "If Other, Specify"? I do not want to get rid of the OthSf variable. I want to create several variables that would combine those responses that are duplicated. I have had issues because there are too many characters.
I would also appreciate any insight on how I can extract this data to create excel pivot tables and graphs in powerpoint to display the data capturing all the responses.
Thank you!
Thank you for your guidance it was very helpful! I didn't use arrays and did the following instead:
/*This SAS code allows you to combine open ended survey responses and create new variables or categories to capture the answers specified within the If other specify multiple choice answer*/
*IMPORT: to import the SAS file and create a library*;
libname saslib "\\(ENTER YOUR LIBRARY PATHWAY HERE"; *creates a permanent SAS File. Creating the library allows us to refer to a file in a specific folder.*;
run;
*Look at your dataset*
data year19;
set saslib.survey;
run;
proc contents data=year19 VARNUM out=contents; *VarNUM list the variables in the order that they were created;
run;
proc sort data=year19;
by surveyYears;
run;
/* List variable*/;
proc print data=year19;
var admin educat tech Other OthSpecify;
run;
/*Format the question responses*/
proc format;
value admin educat tech Other OthSpecify;
1="yes" 2="no";
run;
/* Calculate summary statistics */;
proc means data= year19;
var admin educat tech Other OthSpecify;
run;
/* Examine extreme values */ ;
proc univariate data= year19;
var admin educat tech Other OthSpecify;
run;
/* List unique values and frequencies */
proc freq data= year19;
tables admin educat tech Other OthSpecify;
run;
*FINDINGS: In 2019, the survey respondents selected the following multiple choice options (output descriptive statistics listed below and changed for privacy)
1- admin
N: 900 (number of people responding no)/90% (percentage of people responding no)
Y: 100/10%
2- educat
N: 900/90%
Y: 100/10%
3- tech
N: 900/90%
Y: 100/10%
4- Other
N: 900/90%
Y: 100/10%
5- OthSpecify
N: 900/90%
Y: 100/10%
/* if Other, please specify, recoded below*/
*Using one recoding variable*;
data year19_1; /* create a new dataset (name) for SAS to make any changes to your data*/
set year19; /*old dataset name*/
*Create new variables for the categories above that could have been selected as a response. Meaning, some of the OthSpecify responses should have been one of the 4 multiple choice options available*;
admin_recode = 0;
educat_recode = 0;
tech_recode = 0;
Other_recode = 0;
*Here, setting the initial value of these variables to 0. The value will change if the variable meets the criteria outlined below for each observation. Below we recode those other specify responses that belong to one of the existing multiple choice answers.;
if OthSpecify in
("admin assistant"
"adminn"
"admin assistants")
then admin_recode = 1;
/*you are putting all the values that need to be recoded as admin in the ()*/
else if OthSpecify in
"educator"
"educ")
then educat_recode = 1;
else if OthSpecify in
('technology assistant"
"techs")
then tech_recode = 1;
*Create a final variable that will capture all situations where a respondent indicated a response to the multiple choice answer which will = Yes or provided a value for OthSpecify had a write-in answer;
*Create a final admin variable;
if admin_recode = 1 or admin = "Y" then admin_final = 1;
else admin_final = 0;
*Create a final educat variable;
if educat _recode = 1 or educat = "Y" then educat _final = 1;
else educat _final = 0;
*Create a final tech variable;
if tech _recode = 1 or tech = "Y" then tech _final = 1;
else tech _final = 0;
*Create a final Oth variable;
If
admin_final = 0 and
educat_final = 0 and
tech_final = 0 /*make sure to not add the word and to the last variable*/
then OthSpecify_final = 1;
else OthSpecify_final = 0;
run;
*Printing new recodes*;
proc freq data=year19_1;
tables admin_final educat_final tech_final Other_final OthSpecify_final;
run;
proc print data=year19_1;
var admin_final educat_final tech_final Other_final OthSpecify_final;
run;
Lots of ways to handle some of this depending on several factors.
The first step I always take when dealing with open end responses, whether for multiple or single choice questions, is to see if the "open end" response is really one of the existing choices. For those that should be considered identical to one of the listed categories then the process would be to 1) set the value of the existing category and 2) remove the open end text that had been coded. Sometimes in the case of multiple choice responses this means that there is still an open end, just shortened.
The second step was to standardize the spelling of the non-listed items. Then after standardizing them count the identical responses. If there were "enough" of a single category then perhaps create any category variable. "Enough" could be a very flexible number depending on the overall number of respondents, the number of existing categories and the numbers of responses in the existing categories. No hard and fast rules and consultation with the user of the resulting data may be needed.
Here is one rough outline. Suppose I have a multiple choice question with 3 planned categories and 1000 survey respondents. Response A was selected 200 times, response B 400 times and response C 300 times with 500 total open end responses AFTER coding them back to the response categories.
If one or more of the open end responses has more than 200 responses that almost certainly should be made into a separate category as that is larger than one of the planned responses (an likely implies a certain blindness on the part of the question designer). If I have an open end response that has 100 responses that would be a good candidate for another added category. 10 responses, not so much.
A very rough rule of thumb my company used was if any open end was less than 5% of the sample then the only time to consider it as a new category would be if that exceeded the number of responses of one of the planned categories. This is more likely to happen when there are large numbers of planned responses, say over 40, and often involve a certain "softness" of definition of the terms.
HOW to do that can depend on what you want the data to look like at the end and how your existing data is coded.
Typically I would use a data step with lists of respondents and the recoding for each variable. There are a couple of approaches. You could make data sets that have the "new" values and use an Update to change them. Another approach is in data step coding using any of the IF /Then or possibly select.
A general approach is to create a new variable to recode into so you do not lose the existing information such as this:
data work.classrecode; set sashelp.class; array s (*) age height weight; array r (3) r_age r_height r_weight; /* copy the existing values into the recode variables*/ do i=1 to dim(s); r[i] = s[i]; end; /* the sashelp.class data has NAME as an identifier*/ /* For some reason I want to change some values in this case I need to age some of the students*/ if name in ('Alice' 'Janet' 'Philip') then r_age=r_age+1; /* others as needed*/ run;
You could use something like
if text in ('administrative assistant' 'admin. assit.' 'admin. assit' 'manager assistant' 'man. assit.') then text = 'Assistant';
This whole process is usually iterative and may take a couple of passes. You may also end up consolidating some of your categories based on results. I for one do not know the immediate difference between a "technician" and "specialist".
I found Proc Freq to be very helpful with openends. One thing is the results get sorted alphabetically. So you may want to take a step to set the case to all upper or lower case before starting the cleaning.
I dealt with one survey where we had roughly 20,000 responses and we were asking about who file billing information which had 5 categories and an open end. We started with roughly 4,000 open ended responses to that question. Recoded about 95% into our existing categories after cleaning up the interviewer spelling.
I worked on another survey where the questions were about companies and were basically all open ends. We did provide the interviewers with a list of about 50 expected frequent answers. We still had to clean up the data because even given instructions to enter IBM we had 19 different spellings. My favorite was I>B>M> (Hold down the caps key and press i.b.m. The instructions provided to the interviewers were "no periods" because we had many companies that would use initials.
Good luck!
Thank you for your guidance it was very helpful! I didn't use arrays and did the following instead:
/*This SAS code allows you to combine open ended survey responses and create new variables or categories to capture the answers specified within the If other specify multiple choice answer*/
*IMPORT: to import the SAS file and create a library*;
libname saslib "\\(ENTER YOUR LIBRARY PATHWAY HERE"; *creates a permanent SAS File. Creating the library allows us to refer to a file in a specific folder.*;
run;
*Look at your dataset*
data year19;
set saslib.survey;
run;
proc contents data=year19 VARNUM out=contents; *VarNUM list the variables in the order that they were created;
run;
proc sort data=year19;
by surveyYears;
run;
/* List variable*/;
proc print data=year19;
var admin educat tech Other OthSpecify;
run;
/*Format the question responses*/
proc format;
value admin educat tech Other OthSpecify;
1="yes" 2="no";
run;
/* Calculate summary statistics */;
proc means data= year19;
var admin educat tech Other OthSpecify;
run;
/* Examine extreme values */ ;
proc univariate data= year19;
var admin educat tech Other OthSpecify;
run;
/* List unique values and frequencies */
proc freq data= year19;
tables admin educat tech Other OthSpecify;
run;
*FINDINGS: In 2019, the survey respondents selected the following multiple choice options (output descriptive statistics listed below and changed for privacy)
1- admin
N: 900 (number of people responding no)/90% (percentage of people responding no)
Y: 100/10%
2- educat
N: 900/90%
Y: 100/10%
3- tech
N: 900/90%
Y: 100/10%
4- Other
N: 900/90%
Y: 100/10%
5- OthSpecify
N: 900/90%
Y: 100/10%
/* if Other, please specify, recoded below*/
*Using one recoding variable*;
data year19_1; /* create a new dataset (name) for SAS to make any changes to your data*/
set year19; /*old dataset name*/
*Create new variables for the categories above that could have been selected as a response. Meaning, some of the OthSpecify responses should have been one of the 4 multiple choice options available*;
admin_recode = 0;
educat_recode = 0;
tech_recode = 0;
Other_recode = 0;
*Here, setting the initial value of these variables to 0. The value will change if the variable meets the criteria outlined below for each observation. Below we recode those other specify responses that belong to one of the existing multiple choice answers.;
if OthSpecify in
("admin assistant"
"adminn"
"admin assistants")
then admin_recode = 1;
/*you are putting all the values that need to be recoded as admin in the ()*/
else if OthSpecify in
"educator"
"educ")
then educat_recode = 1;
else if OthSpecify in
('technology assistant"
"techs")
then tech_recode = 1;
*Create a final variable that will capture all situations where a respondent indicated a response to the multiple choice answer which will = Yes or provided a value for OthSpecify had a write-in answer;
*Create a final admin variable;
if admin_recode = 1 or admin = "Y" then admin_final = 1;
else admin_final = 0;
*Create a final educat variable;
if educat _recode = 1 or educat = "Y" then educat _final = 1;
else educat _final = 0;
*Create a final tech variable;
if tech _recode = 1 or tech = "Y" then tech _final = 1;
else tech _final = 0;
*Create a final Oth variable;
If
admin_final = 0 and
educat_final = 0 and
tech_final = 0 /*make sure to not add the word and to the last variable*/
then OthSpecify_final = 1;
else OthSpecify_final = 0;
run;
*Printing new recodes*;
proc freq data=year19_1;
tables admin_final educat_final tech_final Other_final OthSpecify_final;
run;
proc print data=year19_1;
var admin_final educat_final tech_final Other_final OthSpecify_final;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.