Hello SAS Community,
I am working on analyzing a multiple response survey question. It has seven possible selections and respondents were allowed to choose as many of the selections as applied to them. Unfortunately, this has left me in a bit of an analysis bind. The variable was sent to me as a character variable, and all the possible selections were concatenated into one long string. So the variable values look something like:
1 010306
2 0204
3 01
4 0502
5 0205
6 01040507
.
.
.
As it is, this variable is not providing me with much information. I would like to create a new variable for analysis, where if the original variable contained lets say 01somewhere in its string it would be counted as 1, 02 would be 2, etc. Is there a way to subset observations into a new variable based on the whether it contains a specific character combination?
As always, I greatly appreciate your help and expertise!
Thanks!
Hi,
Something like:
data have;
id=1;result="010306"; output;
id=2;result="0204"; output;
id=3;result="01"; output;
id=4;result="0502"; output;
id=5;result="0205"; output;
id=6;result="01040507"; output;
run;
data want (drop=i);
set have;
array result_out{7};
do i=1 to 7;
if index(result,"0"||strip(put(i,best.)))>0 then result_out{i}=i;
end;
run;
I would subtr() out each answer, and out put it to a common variable, which means some kind of transponing the data set.
Then counting can be done with whatever tool/procedure you like.
data want;
set have;
no = length(response);
do i =1 to no by 2;
single_response = substr(response,i,2);
output;
end;
keep response_id single_response;
run;
proc sql;
select single_response, count(*) as no_resp
from want
group by single_response
;
quit;
If you a paying a company to administer your survey then I would address this issue with them to provide the data in another format. Many survey software options allow taking a multiple response question and exporting them as Q1_1 through Q1_n where n represents the number of responses as 0/1 coded dichotomous variables to indicate that choice was selected.
If order of response is not important to you, this may be what you are looking for:
data want;
set have;
q1_1 = (index(result,'01') >0);
q1_2 = (index(result,'02') >0);
q1_3 = (index(result,'03') >0);
/* continue obvious pattern*/
run;
If you have more than 10 response categories and get strings like 02031011 you need to add in some code to use boundaries of the 2 characters like
Q1_2 = (mod(index(result,'02'),2)=1);
This type of coding will assign 0 to those questions without responses which I find useful because a mean of the question will give a percent of respondents picking that choice and a sum the number that did plus you can get confidence intervals on the proportions.
If there is a skip pattern involved then only execute the code where the skip says the respondent should have answered the quesion.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.