From a data set with 40 schools’ responses to a 9 Likert Scale question survey, (sample below and excel file attached) I am trying to get one excel file with the percentages of the Strongly Disagree/Disagree (1 and 2) and of the Agree/Strongly Agree. (4 and 5). I have tried several options in Proc Freq, Proc Tabulate and Proc Survey Freq, and cannot get those percentages all in one table.
sch | Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 |
sch1 | 5 | 4 | 5 | 4 | 4 | 4 | 4 | 4 | 4 |
sch1 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
sch1 | 3 | 2 | 4 | 2 | 3 | 2 | 2 | 2 | 2 |
sch1 | 5 | 5 | 5 | 4 | 5 | 4 | 5 | 5 | 5 |
sch1 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | 5 |
sch1 | 5 | 5 | 5 | 4 | 5 | 5 | 5 | 5 | 5 |
sch2 | 5 | 5 | 5 | 4 | 4 | 4 | 2 | 2 | 4 |
sch2 | 1 | 1 | 4 | 4 | 5 | 4 | 1 | 1 | 4 |
sch2 | 5 | 5 | 5 | 4 | 4 | 4 | 5 | 5 | 4 |
sch2 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 4 | 4 |
sch2 | 4 | 4 | 5 | 4 | 4 | 5 | 4 | 4 | 5 |
sch2 | 3 | 2 | 4 | 2 | 4 | 2 | 2 | 4 | 4 |
sch3 | 5 | 5 | 5 | 5 | 4 | 2 | 5 | 4 | 5 |
sch3 | 2 | 2 | 1 | 3 | 1 | 1 | 1 | 1 | 1 |
sch3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
sch3 | 2 | 5 | 4 | 1 | 1 | 1 | 1 | 1 | 1 |
sch4 | 4 | 2 | 2 | 5 | 2 | 5 | 2 | 2 | 3 |
sch4 | 2 | 2 | 3 | 2 | 4 | 4 | 2 | 4 | 5 |
sch4 | 1 | 1 | 2 | 1 | 3 | 2 | 1 | 1 | 4 |
sch4 | 5 | 5 | 5 | 4 | 4 | 5 | 5 | 5 | 4 |
sch4 | 4 | 4 | 4 | 4 | 4 | 2 | 4 | 4 | 4 |
sch5 | 4 | 2 | 4 | 3 | 4 | 4 | 2 | 3 | 4 |
sch5 | 4 | 2 | 4 | 2 | 4 | 2 | 2 | 2 | 4 |
sch5 | 5 | 4 | 3 | 5 | 4 | 4 | 4 | 5 | 4 |
After converting so 1&2 = D, 3 = O and 4 &5 = A, the closest I have come is 9 separate tables for each question as below using a proc freq crosslist with nocol nofreq nopercent.
proc freq data=ParSurFall20aod; tables school*ans1 /crosslist out=aodcrlist1 nocol nofreq nopercent; run;
But when I try to merge those together I lose the percentages for all but the first question.
Table of School by Q1 | ||
School | Q1 | Row |
Percent | ||
Sch1 | A | 79.17 |
| D | 12.5 |
| O | 8.33 |
| Total | 100 |
Sch2 | A | 91.43 |
| D | 5.71 |
| O | 2.86 |
| Total | 100 |
Sch3 | A | 75 |
| D | 12.5 |
| O | 12.5 |
| Total | 100 |
Sch4 | A | 80.95 |
| D | 11.9 |
| O | 7.14 |
| Total | 100 |
Sch5 | A | 85.71 |
| D | 0 |
| O | 14.29 |
| Total | 100 |
How can I get this table for all 9 questions without manipulating it all in excel every time we do a survey of our schools.
Question | Q1 | Q2 | Q1 | Q2 |
Choice | 1&2 | 1&2 | 4&5 | 4&5 |
Sch1 | 12.5 | 25.0 | 79.2 | 68.8 |
Sch2 | 5.7 | 17.1 | 91.4 | 74.3 |
Sch3 | 12.5 | 6.3 | 75.0 | 87.5 |
Sch4 | 11.9 | 31.0 | 81.0 | 59.5 |
Sch5 | 0.0 | 28.6 | 85.7 | 71.4 |
Sorry, the n is essential for the first transpose to work, and it has to be set in the initial dataset from the automatic variable _N_:
data have;
input sch $ q1-q9;
n = _n_;
datalines;
sch1
5
4
5
4
4
4
4
4
4
sch1
5
5
5
5
5
5
5
5
5
sch1
3
2
4
2
3
2
2
2
2
sch1
5
5
5
4
5
4
5
5
5
sch1
5
5
5
5
5
5
5
4
5
sch1
5
5
5
4
5
5
5
5
5
sch2
5
5
5
4
4
4
2
2
4
sch2
1
1
4
4
5
4
1
1
4
sch2
5
5
5
4
4
4
5
5
4
sch2
4
4
4
4
4
4
3
4
4
sch2
4
4
5
4
4
5
4
4
5
sch2
3
2
4
2
4
2
2
4
4
sch3
5
5
5
5
4
2
5
4
5
sch3
2
2
1
3
1
1
1
1
1
sch3
4
4
4
4
4
4
4
4
4
sch3
2
5
4
1
1
1
1
1
1
sch4
4
2
2
5
2
5
2
2
3
sch4
2
2
3
2
4
4
2
4
5
sch4
1
1
2
1
3
2
1
1
4
sch4
5
5
5
4
4
5
5
5
4
sch4
4
4
4
4
4
2
4
4
4
sch5
4
2
4
3
4
4
2
3
4
sch5
4
2
4
2
4
2
2
2
4
sch5
5
4
3
5
4
4
4
5
4
;
Please post example data as a data step in the future, so we don't get ugly code from the website formatting.
With this dataset and my code, you get this dataset:
sch q1A q2A q2D q3A q4A q4D q5A q6A q6D q7A q7D q8A q8D q9A q9D q1D q3D q5D sch1 83.3% 83.3% 16.7% 100% 83.3% 16.7% 83.3% 83.3% 16.7% 83.3% 16.7% 83.3% 16.7% 83.3% 16.7% . . . sch2 66.7% 66.7% 33.3% 100% 83.3% 16.7% 100% 83.3% 16.7% 33.3% 50.0% 66.7% 33.3% 100% . 16.7% . . sch3 50.0% 75.0% 25.0% 75.0% 50.0% 25.0% 50.0% 25.0% 75.0% 50.0% 50.0% 50.0% 50.0% 50.0% 50.0% 50.0% 25.0% 50.0% sch4 60.0% 40.0% 60.0% 40.0% 60.0% 40.0% 60.0% 60.0% 40.0% 40.0% 60.0% 60.0% 40.0% 80.0% . 40.0% 40.0% 20.0% sch5 100% 33.3% 66.7% 66.7% 33.3% 33.3% 100% 66.7% 33.3% 33.3% 66.7% 33.3% 33.3% 100% . . . .
To get the horizontal order right, you can add a pattern set of all possible Q/A combinations to the top of the dataset, and filter it out in the end result.
So you want to end up with 3 times as many columns?
>But when I try to merge
I don't see much code, and certainly no merge.
Chris,
Thanks for asking. Yes, I actually want to get 18 columns for each school. 1 for each question with % 1or2 and 1 with %4or5.
This code gets me 9 crosslists with the percentages I need, as in the initial post. When I try to merge all the crosslists, with the code below, I get the output below.
data ghost2;
set ghost;
ans1 = " "; if Q1 ne . and Q1 le 2 then ans1 = "D" ;
else if Q1 ne . and Q1 ge 4 then ans1 = "A" ;
else if Q1 ne . and Q1 eq 3 then ans1 = "O" ;
ans2 = " "; if Q2 ne . and Q2 le 2 then ans2 = "D" ;
else if Q2 ne . and Q2 ge 4 then ans2 = "A" ;
else if Q2 ne . and Q2 eq 3 then ans2 = "O" ;
ans3 = " "; if Q3 ne . and Q3 le 2 then ans3 = "D" ;
else if Q3 ne . and Q3 ge 4 then ans3 = "A" ;
else if Q3 ne . and Q3 eq 3 then ans3 = "O" ;
ans4 = " "; if Q4 ne . and Q4 le 2 then ans4 = "D" ;
else if Q4 ne . and Q4 ge 4 then ans4 = "A" ;
else if Q4 ne . and Q4 eq 3 then ans4 = "O" ;
ans5 = " "; if Q5 ne . and Q5 le 2 then ans5 = "D" ;
else if Q5 ne . and Q5 ge 4 then ans5 = "A" ;
else if Q5 ne . and Q5 eq 3 then ans5 = "O" ;
ans6 = " "; if Q6 ne . and Q6 le 2 then ans6 = "D" ;
else if Q6 ne . and Q6 ge 4 then ans6 = "A" ;
else if Q6 ne . and Q6 eq 3 then ans6 = "O" ;
ans7 = " "; if Q7 ne . and Q7 le 2 then ans7 = "D" ;
else if Q7 ne . and Q7 ge 4 then ans7 = "A" ;
else if Q7 ne . and Q7 eq 3 then ans7 = "O" ;
ans8 = " "; if Q8 ne . and Q8 le 2 then ans8 = "D" ;
else if Q8 ne . and Q8 ge 4 then ans8 = "A" ;
else if Q8 ne . and Q8 eq 3 then ans8 = "O" ;
ans9 = " "; if Q9 ne . and Q9 le 2 then ans9 = "D" ;
else if Q9 ne . and Q9 ge 4 then ans9 = "A" ;
else if Q9 ne . and Q9 eq 3 then ans9 = "O" ;
run;
proc freq data=ghost2; tables sch*ans1 /crosslist out=crlist1 nocol nofreq nopercent; run;
proc freq data=ghost2; tables sch*ans2 /crosslist out=crlist2 nocol nofreq nopercent; run;
proc freq data=ghost2; tables sch*ans3 /crosslist out=crlist3 nocol nofreq nopercent; run;
proc freq data=ghost2; tables sch*ans4 /crosslist out=crlist4 nocol nofreq nopercent; run;
proc freq data=ghost2; tables sch*ans5 /crosslist out=crlist5 nocol nofreq nopercent; run;
proc freq data=ghost2; tables sch*ans6 /crosslist out=crlist6 nocol nofreq nopercent; run;
proc freq data=ghost2; tables sch*ans7 /crosslist out=crlist7 nocol nofreq nopercent; run;
proc freq data=ghost2; tables sch*ans8 /crosslist out=crlist8 nocol nofreq nopercent; run;
proc freq data=ghost2; tables sch*ans9 /crosslist out=crlist9 nocol nofreq nopercent; run;
data allcrlist;
merge crlist1 crlist2 crlist3 crlist4 crlist5 crlist6 crlist7 crlist8 crlist9
;
by sch;
run;
sch | ans1 | COUNT | PERCENT | ans2 | ans3 | ans4 | ans5 | ans6 | ans7 | ans8 | ans9 |
sch1 | A | 5 | 20.83333 | A | A | A | A | A | A | A | A |
sch1 | O | 1 | 4.166667 | D | A | D | O | D | D | D | D |
sch2 | A | 6 | 25 | A | A | A | A | A | A | A | A |
sch2 | D | 2 | 8.333333 | D | A | D | A | D | D | D | A |
sch2 | O | 1 | 4.166667 | D | A | D | A | D | O | D | A |
sch3 | A | 2 | 8.333333 | A | A | A | A | A | A | A | A |
sch3 | D | 2 | 8.333333 | D | D | D | D | D | D | D | D |
sch3 | D | 1 | 4.166667 | D | D | O | D | D | D | D | D |
sch4 | A | 4 | 16.66667 | A | A | A | A | A | A | A | A |
sch4 | D | 1 | 4.166667 | D | D | D | D | D | D | D | O |
sch4 | D | 1 | 4.166667 | D | O | D | O | D | D | D | O |
sch5 | A | 3 | 12.5 | A | A | A | A | A | A | A | A |
sch5 | A | 1 | 4.166667 | D | O | D | A | D | D | D | A |
sch5 | A | 1 | 4.166667 | D | O | O | A | D | D | O | A |
Please post the data in usable form and the code you already have.
I would start by transposing the data from wide to long, then "conversion" is assigning one format to one variable. Afterwards proc summary to count per school, question and rating, then ... mmh ... maybe proc tabulate.
See this:
proc transpose
data=have
out=long (
drop=n
rename=(_name_=question)
)
;
by n sch;
var q:;
run;
proc format;
value ans
1,2 = 'D'
3 = 'O'
4,5 = 'A'
;
run;
data long2;
set long;
answer = put(col1,ans.);
drop col1;
run;
proc summary data=long2 nway;
class sch question answer;
output out=sum (drop=_type_);
run;
data sum_long (where=(answer ne 'O'));
do until (last.question);
set sum;
by sch question;
count = sum(count,_freq_);
end;
format percent percent7.2;
do until (last.question);
set sum;
by sch question;
percent = _freq_ / count;
output;
end;
keep sch question answer percent;
run;
proc transpose data=sum_long out=want (drop=_name_);
by sch;
var percent;
id question answer;
run;
The resulting dataset is quite close to what you want.
Thanks, Kurt.
That did something, but I had to drop 'n' from the by statement 'n sch' statement to go anywhere. I did get through all the data steps, but all the QA1, QA2,...QD1, QD2,.. report out at 100%.
If I delete (where=(answer ne 'O')), then I get 27 columns with 100%, so if I could get the correct %s, for A,D,O, then i could use a formula, or drop the O columns.
I think something essential is lost in the shift from Long to Long2 (short snip below), but I don't see it.
Data Long | DATa long2 | |||||||||||||||||
sch | question | _LABEL_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | sch | question | _LABEL_ | COL2 | COL3 | COL4 | COL5 | COL6 | answer | |
sch1 | Q1 | Q1 | 5 | 5 | 3 | 5 | 5 | 5 | sch1 | Q1 | Q1 | 5 | 3 | 5 | 5 | 5 | A | |
sch1 | Q2 | Q2 | 4 | 5 | 2 | 5 | 5 | 5 | sch1 | Q2 | Q2 | 5 | 2 | 5 | 5 | 5 | A | |
sch1 | Q3 | Q3 | 5 | 5 | 4 | 5 | 5 | 5 | sch1 | Q3 | Q3 | 5 | 4 | 5 | 5 | 5 | A | |
sch1 | Q4 | Q4 | 4 | 5 | 2 | 4 | 5 | 4 | sch1 | Q4 | Q4 | 5 | 2 | 4 | 5 | 4 | A | |
sch1 | Q5 | Q5 | 4 | 5 | 3 | 5 | 5 | 5 | sch1 | Q5 | Q5 | 5 | 3 | 5 | 5 | 5 | A | |
sch1 | Q6 | Q6 | 4 | 5 | 2 | 4 | 5 | 5 | sch1 | Q6 | Q6 | 5 | 2 | 4 | 5 | 5 | A | |
sch1 | Q7 | Q7 | 4 | 5 | 2 | 5 | 5 | 5 | sch1 | Q7 | Q7 | 5 | 2 | 5 | 5 | 5 | A | |
sch1 | Q8 | Q8 | 4 | 5 | 2 | 5 | 4 | 5 | sch1 | Q8 | Q8 | 5 | 2 | 5 | 4 | 5 | A | |
sch1 | Q9 | Q9 | 4 | 5 | 2 | 5 | 5 | 5 | sch1 | Q9 | Q9 | 5 | 2 | 5 | 5 | 5 | A | |
sch2 | Q1 | Q1 | 5 | 1 | 5 | 4 | 4 | 3 | sch2 | Q1 | Q1 | 1 | 5 | 4 | 4 | 3 | A | |
sch2 | Q2 | Q2 | 5 | 1 | 5 | 4 | 4 | 2 | sch2 | Q2 | Q2 | 1 | 5 | 4 | 4 | 2 | A | |
sch2 | Q3 | Q3 | 5 | 4 | 5 | 4 | 5 | 4 | sch2 | Q3 | Q3 | 4 | 5 | 4 | 5 | 4 | A | |
sch2 | Q4 | Q4 | 4 | 4 | 4 | 4 | 4 | 2 | sch2 | Q4 | Q4 | 4 | 4 | 4 | 4 | 2 | A | |
sch2 | Q5 | Q5 | 4 | 5 | 4 | 4 | 4 | 4 | sch2 | Q5 | Q5 | 5 | 4 | 4 | 4 | 4 | A | |
sch2 | Q6 | Q6 | 4 | 4 | 4 | 4 | 5 | 2 | sch2 | Q6 | Q6 | 4 | 4 | 4 | 5 | 2 | A | |
sch2 | Q7 | Q7 | 2 | 1 | 5 | 3 | 4 | 2 | sch2 | Q7 | Q7 | 1 | 5 | 3 | 4 | 2 | D | |
sch2 | Q8 | Q8 | 2 | 1 | 5 | 4 | 4 | 4 | sch2 | Q8 | Q8 | 1 | 5 | 4 | 4 | 4 | D | |
sch2 | Q9 | Q9 | 4 | 4 | 4 | 4 | 5 | 4 | sch2 | Q9 | Q9 | 4 | 4 | 4 | 5 | 4 | A |
Sorry, the n is essential for the first transpose to work, and it has to be set in the initial dataset from the automatic variable _N_:
data have;
input sch $ q1-q9;
n = _n_;
datalines;
sch1
5
4
5
4
4
4
4
4
4
sch1
5
5
5
5
5
5
5
5
5
sch1
3
2
4
2
3
2
2
2
2
sch1
5
5
5
4
5
4
5
5
5
sch1
5
5
5
5
5
5
5
4
5
sch1
5
5
5
4
5
5
5
5
5
sch2
5
5
5
4
4
4
2
2
4
sch2
1
1
4
4
5
4
1
1
4
sch2
5
5
5
4
4
4
5
5
4
sch2
4
4
4
4
4
4
3
4
4
sch2
4
4
5
4
4
5
4
4
5
sch2
3
2
4
2
4
2
2
4
4
sch3
5
5
5
5
4
2
5
4
5
sch3
2
2
1
3
1
1
1
1
1
sch3
4
4
4
4
4
4
4
4
4
sch3
2
5
4
1
1
1
1
1
1
sch4
4
2
2
5
2
5
2
2
3
sch4
2
2
3
2
4
4
2
4
5
sch4
1
1
2
1
3
2
1
1
4
sch4
5
5
5
4
4
5
5
5
4
sch4
4
4
4
4
4
2
4
4
4
sch5
4
2
4
3
4
4
2
3
4
sch5
4
2
4
2
4
2
2
2
4
sch5
5
4
3
5
4
4
4
5
4
;
Please post example data as a data step in the future, so we don't get ugly code from the website formatting.
With this dataset and my code, you get this dataset:
sch q1A q2A q2D q3A q4A q4D q5A q6A q6D q7A q7D q8A q8D q9A q9D q1D q3D q5D sch1 83.3% 83.3% 16.7% 100% 83.3% 16.7% 83.3% 83.3% 16.7% 83.3% 16.7% 83.3% 16.7% 83.3% 16.7% . . . sch2 66.7% 66.7% 33.3% 100% 83.3% 16.7% 100% 83.3% 16.7% 33.3% 50.0% 66.7% 33.3% 100% . 16.7% . . sch3 50.0% 75.0% 25.0% 75.0% 50.0% 25.0% 50.0% 25.0% 75.0% 50.0% 50.0% 50.0% 50.0% 50.0% 50.0% 50.0% 25.0% 50.0% sch4 60.0% 40.0% 60.0% 40.0% 60.0% 40.0% 60.0% 60.0% 40.0% 40.0% 60.0% 60.0% 40.0% 80.0% . 40.0% 40.0% 20.0% sch5 100% 33.3% 66.7% 66.7% 33.3% 33.3% 100% 66.7% 33.3% 33.3% 66.7% 33.3% 33.3% 100% . . . .
To get the horizontal order right, you can add a pattern set of all possible Q/A combinations to the top of the dataset, and filter it out in the end result.
Kurt,
You rock!
Thanks for taking the time to go through this both times.
You have saved me hours of work for each time this survey is collected.
@plzsiga wrote:
Kurt,
You rock!
Thanks for taking the time to go through this both times.
You have saved me hours of work for each time this survey is collected.
As it should be. Let the code do the work, and SAS is very good at that.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.