BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

I have a data set w/ some multiple response variables (ie, "Select all that apply"). Each potential response is its own variable. Ultimately, I am seeking syntax that will organize these responses so that I can produce a bar chart showing the number and percent of responses (See attachment entitled Desired Output. I have attached a subset of the data set.).

 

Thank you very much in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

This is similar and a basic change from the previous example:

DATA example;
   do respondent=1 to 10;
      q1_1 = rand('integer',2)-1;
      q1_2 = rand('integer',2)-1;
      q1_3 = rand('integer',2)-1;
      q1_4 = rand('integer',2)-1;
      q1_5 = rand('integer',2)-1;
      q1sum = sum(of q1_:);
      sex =  rand('integer',2)-1;
      output;
   end;

run;

proc tabulate data=example;
   var q1_: q1sum;
   class sex;
   table q1_:,
         sex* (n='Respondents' sum="Number of selected"*f=best5.
         mean="Percent selected"*f=percent8.1 )
         /  style=[pretext='Percentages of respondents']
   ;
   table  (q1_:) q1sum ,
          sex* ( pctsum<q1sum>='Percent of all selected'  
            sum='Number'*f=best5.  )
           /
   ;
   label q1sum='All Q1 responses';
run;

So if you have a sex / gender / whatever variable to make subgroups that is a CLASS variable.

You could suppress the variable name/ default label by using: sex=' '* (<statistics).

If you MUST have the N in the column heading that is a bit more work and may involve macro coding.

If the class variables do not have the text you want to see in the table then create a format to create the desired text. Formats can also create report groups just by applying a different format.

 

There is a paper at https://communities.sas.com/t5/SAS-Communities-Library/Demographic-Table-and-Subgroup-Summary-Macro-...

that has some other approaches to making tables that may help.

View solution in original post

8 REPLIES 8
ballardw
Super User

So you have a multiple response with the selected responses as 1 and 0 for not selected. Is that a correct understanding of your data?

 

Your PDF shows percentages of non-multiple response values. How do I know that? Your table of percentages shows a total of 100%. If you have actual multiple response questions the only way you would get a 100% is if no one selects more than 1 response.

 

Your data is well set up to answer percentage of respondents making each selection.

I'm not going to attempt to use your "data". The below simulates 10 records of a question with 5 possible multiple response answers.

DATA example;
   do respondent=1 to 10;
      q1_1 = rand('integer',2)-1;
      q1_2 = rand('integer',2)-1;
      q1_3 = rand('integer',2)-1;
      q1_4 = rand('integer',2)-1;
      q1_5 = rand('integer',2)-1;
      output;
   end;
run;

proc tabulate data=example;
   var q1_:;
   table q1:,
         n='Respondents' sum="Number of selected"*f=best5.
         mean="Percent selected"*f=percent8.1
         /
   ;
run;

If you want more "percentages" then you will need to carefully describe the numerator and the denominator.

 

_maldini_
Barite | Level 11

@ballardw Thanks for your help.

 

Apologies for the confusion. 

 

<Your table of percentages shows a total of 100%. >

The desired numerator is the total number of responses for that question. The desired denominator is the total number of responses overall - for all questions. As opposed to the total number of respondents. 

 

Also, ideally each question would be output as a row in a table, like a single response variable in PROC FREQ output. 

 

Your syntax is useful though and I may use it instead! Thank you.

 

Is there alternative syntax you might recommend for percent of responses?

 

🙏

_maldini_
Barite | Level 11

Also...I was just trying to get some help on Step 1 of this process. My ultimate goal is to look at these responses across another variable (See Desired Output 2.0). Perhaps I should have posted that from the beginning...

 

I'm basically wanting to do a PROC FREQ on a multiple response variable where each response is it's own variable. I'm open to other ways to think about this. 

 

Thank you. 

ballardw
Super User

You might look at this using a modification of the previous data set:

DATA example;
   do respondent=1 to 10;
      q1_1 = rand('integer',2)-1;
      q1_2 = rand('integer',2)-1;
      q1_3 = rand('integer',2)-1;
      q1_4 = rand('integer',2)-1;
      q1_5 = rand('integer',2)-1;
      q1sum = sum(of q1_:);
      output;
   end;

run;

proc tabulate data=example;
   var q1_: q1sum;
   table q1_:,
         n='Respondents' sum="Number of selected"*f=best5.
         mean="Percent selected"*f=percent8.1
         /  style=[pretext='Percentages of respondents']
   ;
   table  (q1_:) q1sum ,
           pctsum<q1sum>='Percent of all selected'  
            sum='Number'*f=best5.
           /
   ;
   label q1sum='All Q1 responses';
run;

The Q1sum is the count per respondent of selected responses for question 1.

In the proc tabulate a second table is needed because of the behavior of crossing statistics, i.e. it won't let you. Caution with the PCTSUM part. This is one of the very few places where < > are used and is done so because it means something radically different than (). The () would be a group of variables. In PCTSUM it is the name of a single variable and its sum is used as the denominator for calculations of percentage. There is another for count, N for classification variables: PCTN<var> that uses the count of that variable as a denominator. Proc Tabulate offers several other Percentage calculations like ROWPCTSUM or COLPCTSUM. Do not use the <var> with any other of the pct statistics. Unless things have change that can result in a serious, as in terminate SAS runtime data exception.

 

Some other forms of analysis might require reshaping your data so that instead of 5 (or what ever number of variables) with numeric 1/0 coding to making a variable that holds the Name, such as Q1_1 and the value.

Respondent Qname   value

1                  Q1_1       1

1                  Q1_2       0

1                  Q1_3       1

1                  Q1_4       0

1                  Q1_5       0

Then you could select the records where "value = 1" and get counts.

 

If you have a lot of variables that have the same coding, such as typical "strongly disagree", "disagree","neutral", "agree","agree strongly" you could use a common label and and a variable with the question name

Respondent QuestionNumber Category Value

1                  Q1                       1       1

1                  Q1                       2       0

1                  Q1                       3       1

1                  Q1                      4       0

1                  Q1                      5       0

 

Assign a custom format to get the text of the category for display. This would let you get counts ACROSS questions of the agreement level. May want to generate this count by respondent first and then do final statistics as well.

 

If your data is weighted survey data there are a bunch of other caveats that come into play about when/where you apply the weights. Reshaping data has the potential to generate very bizarre results because the weights no longer mean the same thing.

_maldini_
Barite | Level 11

This is great. Thank you!

Another question: Is there a way to look at these responses across another variable (eg. gender, see attachment Desired output 2.0.pdf)?

_maldini_
Barite | Level 11

@ballardw Is there a way to look at these responses across another variable (eg. gender, see attachment Desired output 2.0.pdf)?

ballardw
Super User

This is similar and a basic change from the previous example:

DATA example;
   do respondent=1 to 10;
      q1_1 = rand('integer',2)-1;
      q1_2 = rand('integer',2)-1;
      q1_3 = rand('integer',2)-1;
      q1_4 = rand('integer',2)-1;
      q1_5 = rand('integer',2)-1;
      q1sum = sum(of q1_:);
      sex =  rand('integer',2)-1;
      output;
   end;

run;

proc tabulate data=example;
   var q1_: q1sum;
   class sex;
   table q1_:,
         sex* (n='Respondents' sum="Number of selected"*f=best5.
         mean="Percent selected"*f=percent8.1 )
         /  style=[pretext='Percentages of respondents']
   ;
   table  (q1_:) q1sum ,
          sex* ( pctsum<q1sum>='Percent of all selected'  
            sum='Number'*f=best5.  )
           /
   ;
   label q1sum='All Q1 responses';
run;

So if you have a sex / gender / whatever variable to make subgroups that is a CLASS variable.

You could suppress the variable name/ default label by using: sex=' '* (<statistics).

If you MUST have the N in the column heading that is a bit more work and may involve macro coding.

If the class variables do not have the text you want to see in the table then create a format to create the desired text. Formats can also create report groups just by applying a different format.

 

There is a paper at https://communities.sas.com/t5/SAS-Communities-Library/Demographic-Table-and-Subgroup-Summary-Macro-...

that has some other approaches to making tables that may help.

tarheel13
Rhodochrosite | Level 12

There's more than one way to do this.

one way is with SQL. 

 

proc sql;

select sum(q23_1) as count_q23, (calculated count_q23/count(*))*100 as percent

from table;

quit;

 

You have to use the same logic for each question you want the counts and percents for.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3068 views
  • 4 likes
  • 3 in conversation