Solved: Re: Processing Multiple Response Variables

_maldini_ · Posted 01-15-2021 04:03 PM

I have a data set w/ some multiple response variables (ie, "Select all that apply"). Each potential response is its own variable. Ultimately, I am seeking syntax that will organize these responses so that I can produce a bar chart showing the number and percent of responses (See attachment entitled Desired Output. I have attached a subset of the data set.).

Thank you very much in advance.

ballardw · Posted 01-19-2021 07:13 PM

This is similar and a basic change from the previous example:

DATA example;
   do respondent=1 to 10;
      q1_1 = rand('integer',2)-1;
      q1_2 = rand('integer',2)-1;
      q1_3 = rand('integer',2)-1;
      q1_4 = rand('integer',2)-1;
      q1_5 = rand('integer',2)-1;
      q1sum = sum(of q1_:);
      sex =  rand('integer',2)-1;
      output;
   end;

run;

proc tabulate data=example;
   var q1_: q1sum;
   class sex;
   table q1_:,
         sex* (n='Respondents' sum="Number of selected"*f=best5.
         mean="Percent selected"*f=percent8.1 )
         /  style=[pretext='Percentages of respondents']
   ;
   table  (q1_:) q1sum ,
          sex* ( pctsum<q1sum>='Percent of all selected'  
            sum='Number'*f=best5.  )
           /
   ;
   label q1sum='All Q1 responses';
run;

So if you have a sex / gender / whatever variable to make subgroups that is a CLASS variable.

You could suppress the variable name/ default label by using: sex=' '* (<statistics).

If you MUST have the N in the column heading that is a bit more work and may involve macro coding.

If the class variables do not have the text you want to see in the table then create a format to create the desired text. Formats can also create report groups just by applying a different format.

There is a paper at https://communities.sas.com/t5/SAS-Communities-Library/Demographic-Table-and-Subgroup-Summary-Macro-...

that has some other approaches to making tables that may help.

View solution in original post

ballardw · Posted 01-15-2021 04:38 PM

So you have a multiple response with the selected responses as 1 and 0 for not selected. Is that a correct understanding of your data?

Your PDF shows percentages of non-multiple response values. How do I know that? Your table of percentages shows a total of 100%. If you have actual multiple response questions the only way you would get a 100% is if no one selects more than 1 response.

Your data is well set up to answer percentage of respondents making each selection.

I'm not going to attempt to use your "data". The below simulates 10 records of a question with 5 possible multiple response answers.

DATA example;
   do respondent=1 to 10;
      q1_1 = rand('integer',2)-1;
      q1_2 = rand('integer',2)-1;
      q1_3 = rand('integer',2)-1;
      q1_4 = rand('integer',2)-1;
      q1_5 = rand('integer',2)-1;
      output;
   end;
run;

proc tabulate data=example;
   var q1_:;
   table q1:,
         n='Respondents' sum="Number of selected"*f=best5.
         mean="Percent selected"*f=percent8.1
         /
   ;
run;

If you want more "percentages" then you will need to carefully describe the numerator and the denominator.

_maldini_ · Posted 01-15-2021 06:48 PM

@ballardw Thanks for your help.

Apologies for the confusion.

The desired numerator is the total number of responses for that question. The desired denominator is the total number of responses overall - for all questions. As opposed to the total number of respondents.

Also, ideally each question would be output as a row in a table, like a single response variable in PROC FREQ output.

Your syntax is useful though and I may use it instead! Thank you.

Is there alternative syntax you might recommend for percent of responses?

🙏

_maldini_ · Posted 01-15-2021 07:13 PM

Also...I was just trying to get some help on Step 1 of this process. My ultimate goal is to look at these responses across another variable (See Desired Output 2.0). Perhaps I should have posted that from the beginning...

I'm basically wanting to do a PROC FREQ on a multiple response variable where each response is it's own variable. I'm open to other ways to think about this.

Thank you.

ballardw · Posted 01-15-2021 07:44 PM

You might look at this using a modification of the previous data set:

DATA example;
   do respondent=1 to 10;
      q1_1 = rand('integer',2)-1;
      q1_2 = rand('integer',2)-1;
      q1_3 = rand('integer',2)-1;
      q1_4 = rand('integer',2)-1;
      q1_5 = rand('integer',2)-1;
      q1sum = sum(of q1_:);
      output;
   end;

run;

proc tabulate data=example;
   var q1_: q1sum;
   table q1_:,
         n='Respondents' sum="Number of selected"*f=best5.
         mean="Percent selected"*f=percent8.1
         /  style=[pretext='Percentages of respondents']
   ;
   table  (q1_:) q1sum ,
           pctsum<q1sum>='Percent of all selected'  
            sum='Number'*f=best5.
           /
   ;
   label q1sum='All Q1 responses';
run;

The Q1sum is the count per respondent of selected responses for question 1.

In the proc tabulate a second table is needed because of the behavior of crossing statistics, i.e. it won't let you. Caution with the PCTSUM part. This is one of the very few places where < > are used and is done so because it means something radically different than (). The () would be a group of variables. In PCTSUM it is the name of a single variable and its sum is used as the denominator for calculations of percentage. There is another for count, N for classification variables: PCTN<var> that uses the count of that variable as a denominator. Proc Tabulate offers several other Percentage calculations like ROWPCTSUM or COLPCTSUM. Do not use the <var> with any other of the pct statistics. Unless things have change that can result in a serious, as in terminate SAS runtime data exception.

Some other forms of analysis might require reshaping your data so that instead of 5 (or what ever number of variables) with numeric 1/0 coding to making a variable that holds the Name, such as Q1_1 and the value.

Respondent Qname value

1 Q1_1 1

1 Q1_2 0

1 Q1_3 1

1 Q1_4 0

1 Q1_5 0

Then you could select the records where "value = 1" and get counts.

If you have a lot of variables that have the same coding, such as typical "strongly disagree", "disagree","neutral", "agree","agree strongly" you could use a common label and and a variable with the question name

Respondent QuestionNumber Category Value

1 Q1 1 1

1 Q1 2 0

1 Q1 3 1

1 Q1 4 0

1 Q1 5 0

Assign a custom format to get the text of the category for display. This would let you get counts ACROSS questions of the agreement level. May want to generate this count by respondent first and then do final statistics as well.

If your data is weighted survey data there are a bunch of other caveats that come into play about when/where you apply the weights. Reshaping data has the potential to generate very bizarre results because the weights no longer mean the same thing.

_maldini_ · Posted 01-18-2021 03:10 PM

This is great. Thank you!

Another question: Is there a way to look at these responses across another variable (eg. gender, see attachment Desired output 2.0.pdf)?

_maldini_ · Posted 01-19-2021 06:35 PM

@ballardw Is there a way to look at these responses across another variable (eg. gender, see attachment Desired output 2.0.pdf)?

ballardw · Posted 01-19-2021 07:13 PM