Hello,
I would like you to help me with a question. I have a questionnaire, and I would like to determine the most probable answer to a question. The issue is that this answer consists of up to 3 choices from 3 different columns of another table. Since there is no specific order, when I try to apply "proc freq" as "variable1 * variable2 * variable3," it recognizes different combinations of answers as distinct, such as "mental health, no smoke, drinking water" and "mental health, drinking water, no smoke." Is there a way for the combinations it generates to take into account the similarity of such answers?
Quite often if a single value is wanted that involves more than one variable (SAS uses variables, not "columns" because columns are subject to order changes that variables aren't) then you are responsible for providing the logic to create such a report item usually by adding it to a data set with code.
So, provide example data and what you expect as a result along with how to get a single category from the 3 variables of interest.
I have no idea what "similarity of such answers" would mean in this context either without a lot of details.
@Barney1998 wrote:
Hello,
I would like you to help me with a question. I have a questionnaire, and I would like to determine the most probable answer to a question. The issue is that this answer consists of up to 3 choices from 3 different columns of another table. Since there is no specific order, when I try to apply "proc freq" as "variable1 * variable2 * variable3," it recognizes different combinations of answers as distinct, such as "mental health, no smoke, drinking water" and "mental health, drinking water, no smoke." Is there a way for the combinations it generates to take into account the similarity of such answers?
Ι have a data set which contains the options for the question (the possible answers for the question ) for example :
Healthy life data:
No Smoking
Drinking water
Alcohol
no physical exercise
...
And the respondent could have chosen until 3 answers in order to specify the factors that they believe have an impact in healthy lifestyle.
But if a respondent answer to this question like this
No Smoking Alcohol no physical exercise
And the one other like this:
no physical exercise Alcohol No Smoking
Particularly, they are the same answers but code recognizes them as different.
That i want to achieve, that is No Smoking Alcohol no physical exercise to be taken into account as the same answer as no physical exercise Alcohol No Smoking.
First a comment: Multiple values in a single variable is a bad thing. It leads to all sorts of headache such as this one.
Each response should be a separate variable in SAS.
This is typically easiest done at the time data is read into SAS.
The steps would be to segregate your responses into separate new variables then use Call Sortc on those variables and concatenate back into a single NEW variable with the order. (One way).
Without at least an example of every single possible single response (not the combinations) I won't even attempt to provide an example as I am afraid that your data is going to be ugly to parse out.
I'm trying to explain that a person can provide up to 3 answers they have selected from the possible options.
Let's say the list is the one I mentioned earlier:
Smoking
Drinking water
Alcohol
no physical exercise
Someone can respond:
Smoking
Someone can respond:
Drinking water
Someone can respond:
Drinking water, Alcohol
While someone else may respond:
Alcohol, Drinking water
The problem is that I can count how many times "Drinking water" (or any other choice) appeared, but I can't analyze the combinations that arise in the responses because "Alcohol, Drinking water" is considered a different answer from "Drinking water, Alcohol."
So, I would like to see how I can make the code understand that "Alcohol, Drinking water" is the same as "Drinking water, Alcohol."
In other words, based on the example I provided earlier, I want to have response frequencies like this:
No Smoking 1
Drinking water 1
Drinking water, Alcohol 2
Please show an example of the data you have. Preferably by posting the code for a DATA step that uses the CARDS statement, with data for a few survey respondents. Maybe just these 4 respondents you mention:
ID1 responded: Smoking ID2 responded: Drinking water ID3 responded: Drinking water, Alcohol ID4 responded: Alcohol, Drinking water
The idea of "check-all-that-apply" questions is common. But there are different ways data this sort of question can be stored in a SAS dataset. For example, there could be 3 different variables (Smoking yes/no; DrinkingWater yes/no; Alcohol yes/no), or there could be a list of answers in one variable. In order to help you with an approach, we need to see an example of your data, in a data set (with an ID).
Do you already have the data stored? or are you trying to design how you will store the responses?
There are many ways to handle multiple response data.
The simple way is with multiple observations.
data have;
infile cards dsd truncover;
input id response :$40.;
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water
5,Alcohol
6,Alcohol
6,Drinking water
;
In that case just you can get the number of people with ALCOHOL use directly from PRC FREQ.
You could store it as independent YES/NO variables.
data have;
input id smoking water alcohol none;
cards;
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 0 1 1 0
6 0 1 1 0
;
In this case you can just SUM the boolean variables to find the counts.
You could store it as a series of variables that record their choices.
data have;
infile cards dsd truncover;
input id (response1-response3) ( :$40.);
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water,Alcohol
6,Alcohol,Drinking water
;
In this case to make 5 look like 6 you can use CALL SORTC.
data want;
set have;
call sortc(of response3-response1);
run;
Now you can see that 5 and 6 have the same pattern. But counting number of ALCOCHOL users will be hard.
You could store all of the responses into a single variable with some type of delimiter between them.
data have;
infile cards dsd truncover;
input id responses $70.;
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water,Alcohol
6,Alcohol,Drinking water
;
If that is what you have you will need to convert it first. Probably easiest to make multiple observations.
data want;
set have;
length order 8 response $40 ;
do order=1 by 1 until (order >= countw(responses,','));
response=scan(responses,order,',');
output;
end;
drop responses;
run;
Did you try the call sortc? How did it not work?
Your first response indicates that you have each value in a separate variable, is this correct?
data have;
infile cards dlm=',' truncover;
informat var1-var3 $20.;
input var1 var2 var3;
cards;
No smoking, Drinking water, Alcohol
Drinking water, Alcohol, No smoking
Alcohol, Drinking water, No physical exercise
No physical exercise, Drinking water, Alcohol
;
proc print data=have;
run;
data sorted;
set have;
call sortc(var1, var2, var3);
run;
proc freq data=sorted;
table var1*var2*var3 / list;
run;
@Barney1998 wrote:
Ι have a data set which contains the options for the question (the possible answers for the question ) for example :
Healthy life data:
No Smoking
Drinking water
Alcoholno physical exercise
...
And the respondent could have chosen until 3 answers in order to specify the factors that they believe have an impact in healthy lifestyle.
But if a respondent answer to this question like this
No Smoking Alcohol no physical exercise
And the one other like this:
no physical exercise Alcohol No Smoking
Particularly, they are the same answers but code recognizes them as different.
That i want to achieve, that is No Smoking Alcohol no physical exercise to be taken into account as the same answer as no physical exercise Alcohol No Smoking.
Initially, I have separated each answer into each column (scan). Unfortunately, the method you suggested only works in the case where the candidate answers with up to 2 of the possible choices.
We still have no idea about your data structure. How does your data arrive in the first place? Out of a database, as a text file, Excel?
If text, copy/paste example data into a window opened with this button:
No data
No example of your code
No evidence
Call Sortc will work with as many variables as you provide if done correctly. However you don't show what you did that makes you claim "only works in the case where the candidate answers with up to 2 of the possible choices".
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.