BookmarkSubscribeRSS Feed
Barney1998
Obsidian | Level 7

Hello,
I would like you to help me with a question. I have a questionnaire, and I would like to determine the most probable answer to a question. The issue is that this answer consists of up to 3 choices from 3 different columns of another table. Since there is no specific order, when I try to apply "proc freq" as "variable1 * variable2 * variable3," it recognizes different combinations of answers as distinct, such as "mental health, no smoke, drinking water" and "mental health, drinking water, no smoke." Is there a way for the combinations it generates to take into account the similarity of such answers?

11 REPLIES 11
ballardw
Super User

Quite often if a single value is wanted that involves more than one variable (SAS uses variables, not "columns" because columns are subject to order changes that variables aren't) then you are responsible for providing the logic to create such a report item usually by adding it to a data set with code.

 

So, provide example data and what you expect as a result along with how to get a single category from the 3 variables of interest.

 

I have no idea what "similarity of such answers" would mean in this context either without a lot of details.

 


@Barney1998 wrote:

Hello,
I would like you to help me with a question. I have a questionnaire, and I would like to determine the most probable answer to a question. The issue is that this answer consists of up to 3 choices from 3 different columns of another table. Since there is no specific order, when I try to apply "proc freq" as "variable1 * variable2 * variable3," it recognizes different combinations of answers as distinct, such as "mental health, no smoke, drinking water" and "mental health, drinking water, no smoke." Is there a way for the combinations it generates to take into account the similarity of such answers?


 

Barney1998
Obsidian | Level 7

Ι have a data set which contains the options for the question (the possible answers for the question ) for example :
Healthy life data:
No Smoking
Drinking water
Alcohol  

no physical exercise 
...
And the respondent could have chosen until 3 answers in order to specify the factors that they believe have an impact in healthy lifestyle. 


But if a respondent answer to this question like this 
No Smoking Alcohol  no physical exercise 

And the one other like this:
 no physical exercise  Alcohol  No Smoking

Particularly, they are the same answers but code recognizes them as different.
That i want to achieve, that is   No Smoking Alcohol  no physical exercise  to be taken into account as the same answer as  no physical exercise  Alcohol  No Smoking.

ballardw
Super User

First a comment: Multiple values in a single variable is a bad thing. It leads to all sorts of headache such as this one.

Each response should be a separate variable in SAS.

This is typically easiest done at the time data is read into SAS.

 

The steps would be to segregate your responses into separate new variables then use Call Sortc on those variables and concatenate back into a single NEW variable with the order. (One way).

 

Without at least an example of every single possible single response (not the combinations) I won't even attempt to provide an example as I am afraid that your data is going to be ugly to parse out.

 

 

Barney1998
Obsidian | Level 7

I'm trying to explain that a person can provide up to 3 answers they have selected from the possible options.

Let's say the list is the one I mentioned earlier:
Smoking
Drinking water
Alcohol
no physical exercise


Someone can respond:
Smoking
Someone can respond:
Drinking water
Someone can respond:
Drinking water, Alcohol

While someone else may respond:
 Alcohol, Drinking water


The problem is that I can count how many times "Drinking water" (or any other choice) appeared, but I can't analyze the combinations that arise in the responses because "Alcohol, Drinking water" is considered a different answer from "Drinking water, Alcohol."

So, I would like to see how I can make the code understand that "Alcohol, Drinking water" is the same as "Drinking water, Alcohol."

In other words, based on the example I provided earlier, I want to have response frequencies like this:

No Smoking 1
Drinking water 1
Drinking water, Alcohol 2

Quentin
Super User

Please show an example of the data you have.  Preferably by posting the code for a DATA step that uses the CARDS statement, with  data for a few survey respondents. Maybe just these 4 respondents you mention:

ID1 responded:
Smoking

ID2 responded:
Drinking water

ID3 responded:
Drinking water, Alcohol

ID4 responded:
Alcohol, Drinking water

The idea of "check-all-that-apply" questions is common.  But there are different ways data this sort of question can be stored  in a SAS dataset. For example, there could be 3 different variables (Smoking yes/no; DrinkingWater yes/no; Alcohol yes/no), or there could be a list of answers in one variable. In order to help you with an approach, we need to see an example of your data, in a data set (with an ID).

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.
Tom
Super User Tom
Super User

Do you already have the data stored?  or are you trying to design how you will store the responses?

There are many ways to handle multiple response data.

 

The simple way is with multiple observations.

data have;
  infile cards dsd truncover;
  input id response :$40.;
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water
5,Alcohol
6,Alcohol
6,Drinking water
;

In that case just you can get the number of people with ALCOHOL use directly from PRC FREQ.

 

You could store it as independent YES/NO variables.

data have;
  input id smoking water alcohol none;
cards;
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 0 1 1 0
6 0 1 1 0
;

In this case you can just SUM the boolean variables to find the counts.

 

You could store it as a series of variables that record their choices.

data have;
  infile cards dsd truncover;
  input id (response1-response3) ( :$40.);
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water,Alcohol
6,Alcohol,Drinking water
;

In this case to make 5 look like 6 you can use CALL SORTC.

data want;
  set have;
  call sortc(of response3-response1);
run;

Tom_0-1698237624191.png

Now you can see that 5 and 6 have the same pattern.  But counting number of ALCOCHOL users will be hard.

 

You could store all of the responses into a single variable with some type of delimiter between them.

data have;
  infile cards dsd truncover;
  input id responses $70.;
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water,Alcohol
6,Alcohol,Drinking water
;

If that is what you have you will need to convert it first.   Probably easiest to make multiple observations.

data want;
  set have;
  length order 8 response $40 ;
  do order=1 by 1 until (order >= countw(responses,','));
    response=scan(responses,order,',');
    output;
  end;
  drop responses;
run;

Tom_0-1698238104314.png

 

 

 

Reeza
Super User

Did you try the call sortc? How did it not work?

Your first response indicates that you have each value in a separate variable, is this correct?

 

data have;
    infile cards dlm=',' truncover;
    informat var1-var3 $20.;
    input var1 var2 var3;
    cards;
No smoking, Drinking water, Alcohol
Drinking water, Alcohol, No smoking 
Alcohol, Drinking water, No physical exercise  
No physical exercise, Drinking water, Alcohol
;

proc print data=have;
run;

data sorted;
    set have;
    call sortc(var1, var2, var3);
run;

proc freq data=sorted;
    table var1*var2*var3 / list;
run;

@Barney1998 wrote:

Ι have a data set which contains the options for the question (the possible answers for the question ) for example :
Healthy life data:
No Smoking
Drinking water
Alcohol  

no physical exercise 
...
And the respondent could have chosen until 3 answers in order to specify the factors that they believe have an impact in healthy lifestyle. 


But if a respondent answer to this question like this 
No Smoking Alcohol  no physical exercise 

And the one other like this:
 no physical exercise  Alcohol  No Smoking

Particularly, they are the same answers but code recognizes them as different.
That i want to achieve, that is   No Smoking Alcohol  no physical exercise  to be taken into account as the same answer as  no physical exercise  Alcohol  No Smoking.


 

Barney1998
Obsidian | Level 7

Initially, I have separated each answer into each column (scan). Unfortunately, the method you suggested only works in the case where the candidate answers with up to 2 of the possible choices.

Kurt_Bremser
Super User

We still have no idea about your data structure. How does your data arrive in the first place? Out of a database, as a text file, Excel?

If text, copy/paste example data into a window opened with this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

ballardw
Super User

No data

No example of your code

No evidence

 

Call Sortc will work with as many variables as you provide if done correctly. However you don't show what you did that makes you claim "only works in the case where the candidate answers with up to 2 of the possible choices".

Reeza
Super User
Use CALL SORTC() on the columns to sort the columns in alphabetical order that will allow them to show up the same.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2073 views
  • 2 likes
  • 6 in conversation