Hi Everyone!
I'm looking to calculate response rates for a survey conducted. I have a sample dataset below but the actual dataset contains many more participants and variables.
data sample;
input ID var1 var2 var3 var4 var5;
datalines;
1 3 4 1 3 .
2 2 . 3 1 .
3 1 . 3 2 4
4 1 . 4 4 6
5 1 1 1 . .
6 . 2 . 3 .
7 2 1 1 . 1
8 . 1 4 . 4
9 2 6 . 1 .
10 . . . . .
;
run;
I have three tiers of response rates.
1) "Response" defined as anyone who returns a survey and answers >1 question
2) "Partial Completion" defined as anyone who returns a survey with 50-80% of the questions responded to
3) "Completion" defined as anyone who returns a survey with > 80% of the questions responded to
I imagine calculating this involves me creating a new variable that uses if/then and do statements to look at missing and non missing cells to calculate response rates based on the levels above, but I'm confused as to how to go about it.
Thank you so much for reading!
No, its simpler than that. You can use the N() and NMISS() functions to determine the number of questions answered and the number of questions not answered. From there, you can compute the percent of questions answered.
data want;
set sample;
n_answered=n(of var1-var5);
n_not_answered=nmiss(of var1-var5);
/* HOMEWORK ASSIGNMENT: Compute percents and tiers here */
run;
I don't think your sample data is reflective of your actual data - you likely will have a mix of character and numeric variables for one thing.
data recoded;
set sample;
array Q_num (*) <list of numeric variables>;
array Q_char(*) <list of character variables>;
char_filled = dim(q_char) - cmiss(Q_char);
num_filled = n(q_num);
tot_filled = char_filled + num_filled;
pct_filled = tot_filled / (dim(q_char) + dim(q_num));
if 0.5<=pct_filled <0.8 then response = "Partial Completion";
else if pct_filled >=0.8 then response = "Completed";
else response ="TODO";
run;
@mitrakos wrote:
Hi Everyone!
I'm looking to calculate response rates for a survey conducted. I have a sample dataset below but the actual dataset contains many more participants and variables.
data sample; input ID var1 var2 var3 var4 var5; datalines; 1 3 4 1 3 . 2 2 . 3 1 . 3 1 . 3 2 4 4 1 . 4 4 6 5 1 1 1 . . 6 . 2 . 3 . 7 2 1 1 . 1 8 . 1 4 . 4 9 2 6 . 1 . 10 . . . . . ; run;
I have three tiers of response rates.
1) "Response" defined as anyone who returns a survey and answers >1 question
2) "Partial Completion" defined as anyone who returns a survey with 50-80% of the questions responded to
3) "Completion" defined as anyone who returns a survey with > 80% of the questions responded to
I imagine calculating this involves me creating a new variable that uses if/then and do statements to look at missing and non missing cells to calculate response rates based on the levels above, but I'm confused as to how to go about it.
Thank you so much for reading!
Hi Reeza,
Thanks for the concern. The variables in use on the actual survey are all numeric, as they are coded with numeric labels that correspond to survey responses.
No, its simpler than that. You can use the N() and NMISS() functions to determine the number of questions answered and the number of questions not answered. From there, you can compute the percent of questions answered.
data want;
set sample;
n_answered=n(of var1-var5);
n_not_answered=nmiss(of var1-var5);
/* HOMEWORK ASSIGNMENT: Compute percents and tiers here */
run;
Are your variables Var1 through Varn all of the same type?
You should show what you expect the result to be.
This will give you the percent of completed questions
data want; set sample; array v (*) var1-var5; /*<this statement requires all the variables to be of the same numeric type*/ PercentComplete= (n(of v(*))/dim(v)); run;
IF all of var1 to varn are numeric.
If you have a mix of variable types then you need another method to count how many variables have been completed.
That can be done with something like
You can apply a custom format to Percentcomplete to show range as which type or an If/then/else.
FYI - none of these will correctly account for skip logic type questions. Depending on what you're doing with this, that may or may not be important.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.