## Problems with PROC TABULATE percentages for survey data

Occasional Contributor
Posts: 17

# Problems with PROC TABULATE percentages for survey data

Hi all,
I'm trying to analyze survey data of the 'mark all that apply' type questions.
I've broken each question down to a series of yes/no responses :
q1 q2 q3 q4 q5
Y N N Y Y
I then reformat the data and output a data set (exp) to feed to PROC TABULATE :
This data set lists the labels of the questions that have a 'Y' response. For example, the one observation above would expand into three observations :
question 1
question 4
question 5
Unfortunately, when I use Proc Tabulate and request counts and percentages for each question, the percentages are too low because they are using N for the *expanded* data set.

Here is the code to expand the original data set :

data exp;
set main;
length cat \$ 48;
array md{20} q1--q20;
do j = 1 to 20;
idx = j;
if md{j} = 'Y' then do;
cat = put(j,fgen.);
output;
end;
end;
run;

Note that the format fgen is simply 1 = 'Question 1'
2 = 'Question 2'
3 = 'Question 3' ...

The tabulate procedure looks like this :

proc tabulate data = exp format=comma14. order=data;
class cat;
table cat,n="Yes responses" pctn*all="Percent" / rts = 50;
run;

The first column - the counts - are accurate, BUT the percentages are all too low - ostensibley because the counts are being divided by the N for the expanded data set, not the original data set.
Does anyone know a way around this ? I've tried using the < > operator to change percent denominators, the ROWPCTN option - without success.
Any help/hints would be appreciated.
Barry Walton
Barry.Walton@millersville.edu
Valued Guide
Posts: 653

## Re: Problems with PROC TABULATE percentages for survey data

Let me see if I understand. You want the denominator to be the number of respondants for the question, but you have eliminated all of the 'N' values. If you had Y and N in your data the ROWPCTN would work.
[pre]*Build data;
data survey;
do ID = 1 to 4;
do question = 1 to 5;
if ranuni(999999)>.7 then resp='Y';
else resp='N';
if ranuni(99999)>.3 then output;
end;
end;
run;
proc freq data=survey;
table question*resp;
run;
proc tabulate data=survey;
class question resp;
table question,resp*(n rowpctn);
run;[/pre]
Occasional Contributor
Posts: 17

## Re: Problems with PROC TABULATE percentages for survey data

ArtC,
Yes, your code worked perfectly, and I got a nice tabulation of Yes and No responses with their respective percentages :
Yes No
N PCNT N PCNT
Question 1 7 70 % 3 30 %
Question 2 5 50 % 5 50 %
...
What I wanted to do is count Yes responses only *as a percent of the original data*. Thus, the table would look like this :
Yes Responses
Question 1 7 70%
Question 2 5 50 %
...
The problem is when the data set is expanded, N increases and the percentages that appear are too low inasmuch as they are based on the N from the expanded data set, not the original data set. I fiddled around with the <> option to specify a custom denominator - without success. I'm running SAS 9.1 under Windows XP.
Here is code that will demonstrate the problem. Any ideas ?
proc format ;

value fgen 1 = 'Question 1'
2 = 'Question 2'
3 = 'Question 3'
4 = 'Question 4'
5 = 'Question 5';

data main;
input q1\$ q2\$ q3\$ q4\$ q5\$;
datalines;
Y Y Y N N
N N Y Y N
N N Y N Y
N N N Y Y
Y Y Y Y Y
N Y N Y Y
Y Y Y N Y
Y Y Y N N
N N Y Y N
Y N Y Y Y
;

proc print data = main;
title 'Data set main';
run;

proc freq data = main;
tables q1--q5;
title 'Test frequency counts';
run;

data exp; * Expand the data set to feed to Proc Tabulate ;
set main;
length cat \$ 48;
array md{5} q1--q5;
do j = 1 to 5;
if md{j} = 'Y' then do;
cat = put(j,fgen.);
idx = j;
output;
end;
end;
run;

proc sort data = exp;
by idx;
run;

proc print data = exp;
title 'Data set exp';
run;

proc tabulate data = exp format=comma14. order=data;
class cat;
table cat,n="Yes responses" pctn*all="Percent" / box = 'Survey data' rts = 15;
label cat = 'Category';
title "Tabulation of 'Mark all that apply' questons";
run;

Barry Walton
Barry.Walton@millersville.edu
Valued Guide
Posts: 653

## Re: Problems with PROC TABULATE percentages for survey data

Using Peter's suggestion, I have modified my example. Notice that RESP is now a 0,1 variable and a VAR statement has been included in the TABULATE step.
[pre]data survey;
do ID = 1 to 14;
do question = 1 to 5;
if ranuni(999999)>.4 then resp=1;
else resp=0;
if ranuni(99999)>.3 then output;
end;
end;
run;

proc tabulate data=survey;
class question;
var resp;
table question,
resp='responses'*(n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;
run;

[/pre]
SAS Super FREQ
Posts: 9,370

## Re: Problems with PROC TABULATE percentages for survey data

To build on ArtC's example, if you just change your program that builds the dataset WORK.EXP (so that you have 1's and 0's for Y and N), you can use ArtC's code to see how the report looks with your data. There is no need to keep all 5 questions for every obs, because with this approach, you have essentially transposed the data so that instead of one observation/row with all 5 of the responses, you now have one row per response in the WORK.EXP dataset.

I did not repeat the data reading program (for WORK.MAIN), as that stayed the same, but I did change the code starting with DATA EXP step.

cynthia
[pre]

data exp(keep=orig_obs question resp idx char_resp);
* Expand the data set to feed to Proc Tabulate ;
** In this version of the data, create variables QUESTION and RESP;
** per ArtC example. Also keep track of orig_obs number, ;
** in case you have to go back and research any strange answers.;
** Keep character version of response;
** in CHAR_RESP variable. RESP variable will be 1 or 0;
set main;
orig_obs = _n_;

length question cat \$ 48 char_resp \$1;
array md{5} q1--q5;
do j = 1 to 5;
if md{j} = 'Y' then resp=1;
else resp=0;
char_resp = upcase(md{j});
cat = put(j,fgen.);
question = put(j,fgen.);
idx = j;
output;
end;
run;

proc print data = exp;
title 'Data set exp';
run;

ods listing close;
ods html file='c:\temp\tab_survey.html' style=sasweb;
proc tabulate data=exp;
class question;
var resp;
table question,
resp='responses'*(n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;
run;
ods html close;

[/pre]
Valued Guide
Posts: 2,191

## Re: Problems with PROC TABULATE percentages for survey data

depending on how you want to solve this PCTSUM might do it.
When you have 1 for yes and 0 for no in a column called resp01, the sum would be the count of yesses and the N would be total respondents(for that question) and mean and pctsum would provide what I think you seek.
check with a table like
table cat, resp01='responses'*(
n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;

I like it when % formatting is natural

luck
peterC
Discussion stats
• 5 replies
• 263 views
• 0 likes
• 4 in conversation