Hey all!
I've got some survey data that came out of Qualtrics where the "Check all that apply" options were all put in the same field with commas separating the options. For example,
We had a question asking what topics people were most interested in (select up to 5) and the results came back like this:
Education,Technology |
Health care,Human services |
Health care,Human services,Justice and law ,Public safety and emergency ,Workforce and the economy |
Education,Housing,Human services,Justice and law ,Technology |
Agriculture,Education,Environment and natural resources,Health care,Workforce and the economy |
Health care,Human services,Technology,Workforce and the economy |
Environment and natural resources,Housing,Justice and law ,Transportation and infrastructure,Workforce and the economy |
Education,Environment and natural resources,Health care,Justice and law ,Technology |
Finance, accounting and procurement,Justice and law ,Tax and fiscal policy,Technology,Workforce and the economy |
Education,Environment and natural resources,Housing,Public safety and emergency ,Transportation and infrastructure |
Agriculture,Energy,Environment and natural resources,Transportation and infrastructure,Workforce and the economy |
Energy,Health care,Human services,Public safety and emergency ,Workforce and the economy |
Health care,Justice and law ,Tax and fiscal policy,Technology,Workforce and the economy |
Environment and natural resources |
Environment and natural resources,Health care,Justice and law ,Transportation and infrastructure,Technology |
Clearly I'd like to split all this data up so that I can get summary counts of each topic. Any thoughts on how I can do this quickly and easily? I'm mostly stumped. I don't even mind if I have a separate data set with one column where each row is a single response. I'm totally cool with that. Just don't know how to get there...
I found one option that sort of looks like what I want, if only I could get it to work for the entire dataset... If you check out the SAS Documentation for the SCAN function (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm) under Example #2 there's a piece where it uses a do loop to split apart each word in a string. However, when I tried this with my dataset (see sample code below) it only gave me the data for the first observation. It's late in the day and I can't figure this one out. Thoughts?
data split; set results;
keep word;
do until(word=' ');
count+1;
word = scan(Q6, count);
output;
end;
run;
The code you came up with is pretty close to working. First, insert this statement before the DO loop:
count=0;
Without it, COUNT just keeps on increasing. So that gets COUNT to start over for each observation.
And second, you would probably be better off changing the OUTPUT statement to be:
if word > ' ' then output;
The code you came up with is pretty close to working. First, insert this statement before the DO loop:
count=0;
Without it, COUNT just keeps on increasing. So that gets COUNT to start over for each observation.
And second, you would probably be better off changing the OUTPUT statement to be:
if word > ' ' then output;
So I changed my code to this, as suggested:
data split; set results;
keep word;
count=0;
do until(word=' ');
count+1;
word = scan(Q6, count, ",");
if word>' ' then output;
end;
run;
And it worked perfectly! Thank you!
You better specify the comma as delimiter
data split;
set results;
length word $40;
do count = 1 to countw(Q6, ",");
word = scan(Q6, count, ",");
output;
end;
run;
Yes! Thank you!
Not far off, so changed your slightly:
data split (drop=i); length word $200; set results; do i=1 to countw(q6,","); word=scan(q6,i); output; end; run;
From working with a number of Survey software platforms I might suggest seeing if there are data export options from the survey software and re-exporting. It is not uncommon to have an option for multiple response questions to create multiple dichotomous(yes/no) responses from a similar question (best) or a "one response category variable per selection" format.
Since the next step after identifying or splitting up your combined field data should be to get something so you know which response is what (the first response does not mean the same thing for all respondents) re-exporting the data with proper settings may save a lot of headaches, especially since if you have many of these multiple response questions.
I will say I much prefer software that doesn't use commas in this particular delimiter role as sometimes the individual response categories may contain one or more commas.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.