BookmarkSubscribeRSS Feed
enginemane44
Calcite | Level 5
Hi all,
I'm trying to analyze survey data of the 'mark all that apply' type questions.
I've broken each question down to a series of yes/no responses :
q1 q2 q3 q4 q5
Y N N Y Y
I then reformat the data and output a data set (exp) to feed to PROC TABULATE :
This data set lists the labels of the questions that have a 'Y' response. For example, the one observation above would expand into three observations :
question 1
question 4
question 5
Unfortunately, when I use Proc Tabulate and request counts and percentages for each question, the percentages are too low because they are using N for the *expanded* data set.

Here is the code to expand the original data set :

data exp;
set main;
length cat $ 48;
array md{20} q1--q20;
do j = 1 to 20;
idx = j;
if md{j} = 'Y' then do;
cat = put(j,fgen.);
output;
end;
end;
run;

Note that the format fgen is simply 1 = 'Question 1'
2 = 'Question 2'
3 = 'Question 3' ...

The tabulate procedure looks like this :


proc tabulate data = exp format=comma14. order=data;
class cat;
table cat,n="Yes responses" pctn*all="Percent" / rts = 50;
run;

The first column - the counts - are accurate, BUT the percentages are all too low - ostensibley because the counts are being divided by the N for the expanded data set, not the original data set.
Does anyone know a way around this ? I've tried using the < > operator to change percent denominators, the ROWPCTN option - without success.
Any help/hints would be appreciated.
Barry Walton
Barry.Walton@millersville.edu
5 REPLIES 5
ArtC
Rhodochrosite | Level 12
Let me see if I understand. You want the denominator to be the number of respondants for the question, but you have eliminated all of the 'N' values. If you had Y and N in your data the ROWPCTN would work.
[pre]*Build data;
data survey;
do ID = 1 to 4;
do question = 1 to 5;
if ranuni(999999)>.7 then resp='Y';
else resp='N';
if ranuni(99999)>.3 then output;
end;
end;
run;
proc freq data=survey;
table question*resp;
run;
proc tabulate data=survey;
class question resp;
table question,resp*(n rowpctn);
run;[/pre]
enginemane44
Calcite | Level 5
ArtC,
Yes, your code worked perfectly, and I got a nice tabulation of Yes and No responses with their respective percentages :
Yes No
N PCNT N PCNT
Question 1 7 70 % 3 30 %
Question 2 5 50 % 5 50 %
...
What I wanted to do is count Yes responses only *as a percent of the original data*. Thus, the table would look like this :
Yes Responses
Question 1 7 70%
Question 2 5 50 %
...
The problem is when the data set is expanded, N increases and the percentages that appear are too low inasmuch as they are based on the N from the expanded data set, not the original data set. I fiddled around with the <> option to specify a custom denominator - without success. I'm running SAS 9.1 under Windows XP.
Here is code that will demonstrate the problem. Any ideas ?
proc format ;

value fgen 1 = 'Question 1'
2 = 'Question 2'
3 = 'Question 3'
4 = 'Question 4'
5 = 'Question 5';


data main;
input q1$ q2$ q3$ q4$ q5$;
datalines;
Y Y Y N N
N N Y Y N
N N Y N Y
N N N Y Y
Y Y Y Y Y
N Y N Y Y
Y Y Y N Y
Y Y Y N N
N N Y Y N
Y N Y Y Y
;


proc print data = main;
title 'Data set main';
run;

proc freq data = main;
tables q1--q5;
title 'Test frequency counts';
run;


data exp; * Expand the data set to feed to Proc Tabulate ;
set main;
length cat $ 48;
array md{5} q1--q5;
do j = 1 to 5;
if md{j} = 'Y' then do;
cat = put(j,fgen.);
idx = j;
output;
end;
end;
run;


proc sort data = exp;
by idx;
run;


proc print data = exp;
title 'Data set exp';
run;



proc tabulate data = exp format=comma14. order=data;
class cat;
table cat,n="Yes responses" pctn*all="Percent" / box = 'Survey data' rts = 15;
label cat = 'Category';
title "Tabulation of 'Mark all that apply' questons";
run;

Barry Walton
Barry.Walton@millersville.edu
ArtC
Rhodochrosite | Level 12
Using Peter's suggestion, I have modified my example. Notice that RESP is now a 0,1 variable and a VAR statement has been included in the TABULATE step.
[pre]data survey;
do ID = 1 to 14;
do question = 1 to 5;
if ranuni(999999)>.4 then resp=1;
else resp=0;
if ranuni(99999)>.3 then output;
end;
end;
run;

proc tabulate data=survey;
class question;
var resp;
table question,
resp='responses'*(n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;
run;

[/pre]
Cynthia_sas
SAS Super FREQ
To build on ArtC's example, if you just change your program that builds the dataset WORK.EXP (so that you have 1's and 0's for Y and N), you can use ArtC's code to see how the report looks with your data. There is no need to keep all 5 questions for every obs, because with this approach, you have essentially transposed the data so that instead of one observation/row with all 5 of the responses, you now have one row per response in the WORK.EXP dataset.

I did not repeat the data reading program (for WORK.MAIN), as that stayed the same, but I did change the code starting with DATA EXP step.

cynthia
[pre]

data exp(keep=orig_obs question resp idx char_resp);
* Expand the data set to feed to Proc Tabulate ;
** In this version of the data, create variables QUESTION and RESP;
** per ArtC example. Also keep track of orig_obs number, ;
** in case you have to go back and research any strange answers.;
** Keep character version of response;
** in CHAR_RESP variable. RESP variable will be 1 or 0;
set main;
orig_obs = _n_;

length question cat $ 48 char_resp $1;
array md{5} q1--q5;
do j = 1 to 5;
if md{j} = 'Y' then resp=1;
else resp=0;
char_resp = upcase(md{j});
cat = put(j,fgen.);
question = put(j,fgen.);
idx = j;
output;
end;
run;

proc print data = exp;
title 'Data set exp';
run;

ods listing close;
ods html file='c:\temp\tab_survey.html' style=sasweb;
proc tabulate data=exp;
class question;
var resp;
table question,
resp='responses'*(n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;
run;
ods html close;

[/pre]
Peter_C
Rhodochrosite | Level 12
depending on how you want to solve this PCTSUM might do it.
When you have 1 for yes and 0 for no in a column called resp01, the sum would be the count of yesses and the N would be total respondents(for that question) and mean and pctsum would provide what I think you seek.
check with a table like
table cat, resp01='responses'*(
n='total responders' *f= comma7.
sum='total yes' *f= comma7.
pctsum='response rate for this question'*f=5.1
pctn='rate of Yes over whole survey' *f= 5.
mean='mean Q resp' * f=percent7.1
) /rts=40;

I like it when % formatting is natural


luck
peterC

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1049 views
  • 0 likes
  • 4 in conversation