BookmarkSubscribeRSS Feed
enginemane44
Calcite | Level 5
Hi all,
I'm trying to analyze survey data of the 'mark all that apply' type questions.
I've broken each question down to a series of yes/no responses :
q1 q2 q3 q4 q5
Y N N Y Y
I then reformat the data and output a data set (exp) to feed to PROC TABULATE :
This data set lists the labels of the questions that have a 'Y' response. For example, the one observation above would expand into three observations :
question 1
question 4
question 5
Unfortunately, when I use Proc Tabulate and request counts and percentages for each question, the percentages are too low because they are using N for the *expanded* data set.

Here is the code to expand the original data set :

data exp;
set main;
length cat $ 48;
array md{20} q1--q20;
do j = 1 to 20;
idx = j;
if md{j} = 'Y' then do;
cat = put(j,fgen.);
output;
end;
end;
run;

Note that the format fgen is simply 1 = 'Question 1'
2 = 'Question 2'
3 = 'Question 3' ...

The tabulate procedure looks like this :


proc tabulate data = exp format=comma14. order=data;
class cat;
table cat,n="Yes responses" pctn*all="Percent" / rts = 50;
run;

The first column - the counts - are accurate, BUT the percentages are all too low - ostensibley because the counts are being divided by the N for the expanded data set, not the original data set.
Does anyone know a way around this ? I've tried using the < > operator to change percent denominators, the ROWPCTN option - without success.
Any help/hints would be appreciated.
Barry Walton
Barry.Walton@millersville.edu
5 REPLIES 5
ArtC
Rhodochrosite | Level 12
Let me see if I understand. You want the denominator to be the number of respondants for the question, but you have eliminated all of the 'N' values. If you had Y and N in your data the ROWPCTN would work.
[pre]*Build data;
data survey;
do ID = 1 to 4;
do question = 1 to 5;
if ranuni(999999)>.7 then resp='Y';
else resp='N';
if ranuni(99999)>.3 then output;
end;
end;
run;
proc freq data=survey;
table question*resp;
run;
proc tabulate data=survey;
class question resp;
table question,resp*(n rowpctn);
run;[/pre]
enginemane44
Calcite | Level 5
ArtC,
Yes, your code worked perfectly, and I got a nice tabulation of Yes and No responses with their respective percentages :
Yes No
N PCNT N PCNT
Question 1 7 70 % 3 30 %
Question 2 5 50 % 5 50 %
...
What I wanted to do is count Yes responses only *as a percent of the original data*. Thus, the table would look like this :
Yes Responses
Question 1 7 70%
Question 2 5 50 %
...
The problem is when the data set is expanded, N increases and the percentages that appear are too low inasmuch as they are based on the N from the expanded data set, not the original data set. I fiddled around with the <> option to specify a custom denominator - without success. I'm running SAS 9.1 under Windows XP.
Here is code that will demonstrate the problem. Any ideas ?
proc format ;

value fgen 1 = 'Question 1'
2 = 'Question 2'
3 = 'Question 3'
4 = 'Question 4'
5 = 'Question 5';


data main;
input q1$ q2$ q3$ q4$ q5$;
datalines;
Y Y Y N N
N N Y Y N
N N Y N Y
N N N Y Y
Y Y Y Y Y
N Y N Y Y
Y Y Y N Y
Y Y Y N N
N N Y Y N
Y N Y Y Y
;


proc print data = main;
title 'Data set main';
run;

proc freq data = main;
tables q1--q5;
title 'Test frequency counts';
run;


data exp; * Expand the data set to feed to Proc Tabulate ;
set main;
length cat $ 48;
array md{5} q1--q5;
do j = 1 to 5;
if md{j} = 'Y' then do;
cat = put(j,fgen.);
idx = j;
output;
end;
end;
run;


proc sort data = exp;
by idx;
run;


proc print data = exp;
title 'Data set exp';
run;



proc tabulate data = exp format=comma14. order=data;
class cat;
table cat,n="Yes responses" pctn*all="Percent" / box = 'Survey data' rts = 15;
label cat = 'Category';
title "Tabulation of 'Mark all that apply' questons";
run;

Barry Walton
Barry.Walton@millersville.edu
ArtC
Rhodochrosite | Level 12
Using Peter's suggestion, I have modified my example. Notice that RESP is now a 0,1 variable and a VAR statement has been included in the TABULATE step.
[pre]data survey;
do ID = 1 to 14;
do question = 1 to 5;
if ranuni(999999)>.4 then resp=1;
else resp=0;
if ranuni(99999)>.3 then output;
end;
end;
run;

proc tabulate data=survey;
class question;
var resp;
table question,
resp='responses'*(n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;
run;

[/pre]
Cynthia_sas
Diamond | Level 26
To build on ArtC's example, if you just change your program that builds the dataset WORK.EXP (so that you have 1's and 0's for Y and N), you can use ArtC's code to see how the report looks with your data. There is no need to keep all 5 questions for every obs, because with this approach, you have essentially transposed the data so that instead of one observation/row with all 5 of the responses, you now have one row per response in the WORK.EXP dataset.

I did not repeat the data reading program (for WORK.MAIN), as that stayed the same, but I did change the code starting with DATA EXP step.

cynthia
[pre]

data exp(keep=orig_obs question resp idx char_resp);
* Expand the data set to feed to Proc Tabulate ;
** In this version of the data, create variables QUESTION and RESP;
** per ArtC example. Also keep track of orig_obs number, ;
** in case you have to go back and research any strange answers.;
** Keep character version of response;
** in CHAR_RESP variable. RESP variable will be 1 or 0;
set main;
orig_obs = _n_;

length question cat $ 48 char_resp $1;
array md{5} q1--q5;
do j = 1 to 5;
if md{j} = 'Y' then resp=1;
else resp=0;
char_resp = upcase(md{j});
cat = put(j,fgen.);
question = put(j,fgen.);
idx = j;
output;
end;
run;

proc print data = exp;
title 'Data set exp';
run;

ods listing close;
ods html file='c:\temp\tab_survey.html' style=sasweb;
proc tabulate data=exp;
class question;
var resp;
table question,
resp='responses'*(n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;
run;
ods html close;

[/pre]
Peter_C
Rhodochrosite | Level 12
depending on how you want to solve this PCTSUM might do it.
When you have 1 for yes and 0 for no in a column called resp01, the sum would be the count of yesses and the N would be total respondents(for that question) and mean and pctsum would provide what I think you seek.
check with a table like
table cat, resp01='responses'*(
n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;

I like it when % formatting is natural


luck
peterC
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2206 views
  • 0 likes
  • 4 in conversation