Re: Proc freq

Barney1998 · Posted 10-23-2023 04:31 PM

Hello,
I would like you to help me with a question. I have a questionnaire, and I would like to determine the most probable answer to a question. The issue is that this answer consists of up to 3 choices from 3 different columns of another table. Since there is no specific order, when I try to apply "proc freq" as "variable1 * variable2 * variable3," it recognizes different combinations of answers as distinct, such as "mental health, no smoke, drinking water" and "mental health, drinking water, no smoke." Is there a way for the combinations it generates to take into account the similarity of such answers?

ballardw · Posted 10-23-2023 04:39 PM

Quite often if a single value is wanted that involves more than one variable (SAS uses variables, not "columns" because columns are subject to order changes that variables aren't) then you are responsible for providing the logic to create such a report item usually by adding it to a data set with code.

So, provide example data and what you expect as a result along with how to get a single category from the 3 variables of interest.

I have no idea what "similarity of such answers" would mean in this context either without a lot of details.

@Barney1998 wrote:

Hello,
I would like you to help me with a question. I have a questionnaire, and I would like to determine the most probable answer to a question. The issue is that this answer consists of up to 3 choices from 3 different columns of another table. Since there is no specific order, when I try to apply "proc freq" as "variable1 * variable2 * variable3," it recognizes different combinations of answers as distinct, such as "mental health, no smoke, drinking water" and "mental health, drinking water, no smoke." Is there a way for the combinations it generates to take into account the similarity of such answers?

Barney1998 · Posted 10-24-2023 01:55 AM

Ι have a data set which contains the options for the question (the possible answers for the question ) for example :
Healthy life data:
No Smoking
Drinking water
Alcohol

no physical exercise
...
And the respondent could have chosen until 3 answers in order to specify the factors that they believe have an impact in healthy lifestyle.

But if a respondent answer to this question like this
No Smoking Alcohol  no physical exercise

And the one other like this:
no physical exercise  Alcohol  No Smoking

Particularly, they are the same answers but code recognizes them as different.
That i want to achieve, that is No Smoking Alcohol  no physical exercise to be taken into account as the same answer as no physical exercise  Alcohol  No Smoking.

ballardw · Posted 10-24-2023 06:07 AM

First a comment: Multiple values in a single variable is a bad thing. It leads to all sorts of headache such as this one.

Each response should be a separate variable in SAS.

This is typically easiest done at the time data is read into SAS.

The steps would be to segregate your responses into separate new variables then use Call Sortc on those variables and concatenate back into a single NEW variable with the order. (One way).

Without at least an example of every single possible single response (not the combinations) I won't even attempt to provide an example as I am afraid that your data is going to be ugly to parse out.

Barney1998 · Posted 10-24-2023 07:14 AM

I'm trying to explain that a person can provide up to 3 answers they have selected from the possible options.

Let's say the list is the one I mentioned earlier:
Smoking
Drinking water
Alcohol
no physical exercise

Someone can respond:
Smoking
Someone can respond:
Drinking water
Someone can respond:
Drinking water, Alcohol

While someone else may respond:
Alcohol, Drinking water

The problem is that I can count how many times "Drinking water" (or any other choice) appeared, but I can't analyze the combinations that arise in the responses because "Alcohol, Drinking water" is considered a different answer from "Drinking water, Alcohol."

So, I would like to see how I can make the code understand that "Alcohol, Drinking water" is the same as "Drinking water, Alcohol."

In other words, based on the example I provided earlier, I want to have response frequencies like this:

No Smoking 1
Drinking water 1
Drinking water, Alcohol 2

Quentin · Posted 10-24-2023 07:32 AM

Please show an example of the data you have. Preferably by posting the code for a DATA step that uses the CARDS statement, with data for a few survey respondents. Maybe just these 4 respondents you mention:

ID1 responded:
Smoking

ID2 responded:
Drinking water

ID3 responded:
Drinking water, Alcohol

ID4 responded:
Alcohol, Drinking water

The idea of "check-all-that-apply" questions is common. But there are different ways data this sort of question can be stored in a SAS dataset. For example, there could be 3 different variables (Smoking yes/no; DrinkingWater yes/no; Alcohol yes/no), or there could be a list of answers in one variable. In order to help you with an approach, we need to see an example of your data, in a data set (with an ID).

Tom · Posted 10-25-2023 08:46 AM

Do you already have the data stored? or are you trying to design how you will store the responses?

There are many ways to handle multiple response data.

The simple way is with multiple observations.

data have;
  infile cards dsd truncover;
  input id response :$40.;
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water
5,Alcohol
6,Alcohol
6,Drinking water
;

In that case just you can get the number of people with ALCOHOL use directly from PRC FREQ.

You could store it as independent YES/NO variables.

data have;
  input id smoking water alcohol none;
cards;
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 0 1 1 0
6 0 1 1 0
;

In this case you can just SUM the boolean variables to find the counts.

You could store it as a series of variables that record their choices.

data have;
  infile cards dsd truncover;
  input id (response1-response3) ( :$40.);
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water,Alcohol
6,Alcohol,Drinking water
;

In this case to make 5 look like 6 you can use CALL SORTC.

data want;
  set have;
  call sortc(of response3-response1);
run;

Now you can see that 5 and 6 have the same pattern. But counting number of ALCOCHOL users will be hard.

You could store all of the responses into a single variable with some type of delimiter between them.

data have;
  infile cards dsd truncover;
  input id responses $70.;
cards;
1,Smoking
2,Drinking water
3,Alcohol
4,no physical exercise
5,Drinking water,Alcohol
6,Alcohol,Drinking water
;

If that is what you have you will need to convert it first. Probably easiest to make multiple observations.

data want;
  set have;
  length order 8 response $40 ;
  do order=1 by 1 until (order >= countw(responses,','));
    response=scan(responses,order,',');
    output;
  end;
  drop responses;
run;

Reeza · Posted 10-24-2023 10:59 AM

Did you try the call sortc? How did it not work?

Your first response indicates that you have each value in a separate variable, is this correct?

data have;
    infile cards dlm=',' truncover;
    informat var1-var3 $20.;
    input var1 var2 var3;
    cards;
No smoking, Drinking water, Alcohol
Drinking water, Alcohol, No smoking 
Alcohol, Drinking water, No physical exercise  
No physical exercise, Drinking water, Alcohol
;

proc print data=have;
run;

data sorted;
    set have;
    call sortc(var1, var2, var3);
run;

proc freq data=sorted;
    table var1*var2*var3 / list;
run;

@Barney1998 wrote:

Ι have a data set which contains the options for the question (the possible answers for the question ) for example :
Healthy life data:
No Smoking
Drinking water
Alcohol

no physical exercise
...
And the respondent could have chosen until 3 answers in order to specify the factors that they believe have an impact in healthy lifestyle.

But if a respondent answer to this question like this
No Smoking Alcohol  no physical exercise

And the one other like this:
no physical exercise  Alcohol  No Smoking

Particularly, they are the same answers but code recognizes them as different.
That i want to achieve, that is No Smoking Alcohol  no physical exercise to be taken into account as the same answer as no physical exercise  Alcohol  No Smoking.

Barney1998 · Posted 10-25-2023 02:57 AM

Initially, I have separated each answer into each column (scan). Unfortunately, the method you suggested only works in the case where the candidate answers with up to 2 of the possible choices.

Kurt_Bremser · Posted 10-25-2023 05:09 AM

We still have no idea about your data structure. How does your data arrive in the first place? Out of a database, as a text file, Excel?

If text, copy/paste example data into a window opened with this button:

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 10-25-2023 10:50 AM

No data

No example of your code

No evidence

Call Sortc will work with as many variables as you provide if done correctly. However you don't show what you did that makes you claim "only works in the case where the candidate answers with up to 2 of the possible choices".

Reeza · Posted 10-23-2023 05:26 PM

Use CALL SORTC() on the columns to sort the columns in alphabetical order that will allow them to show up the same.

Registration is open

SAS Training: Just a Click Away