Hi. I am very new to SAS.I’m working on a project which involves recoding a large amount of evaluation survey questions, scoring them and then aggregating the scores.
In the dataset below thereare the values of 1,2,3,4. “4” means the question was not applicable in those cases. So, for var1 there are 9 cases applicable, for var2 there are 7 and var3all 10 cases are applicable.
data have;
infile datalines dlm=',' dsd;
input Record $ var1 var2 var3;
datalines;
"Record1", 1,2,2
"Record2 ",1,2,2
"Record3 ",1,2,2
"Record4 ",2,3,3
"Record5 ",1,1,1
" Record6",3,3,3
" Record7",1,3,3
" Record8",3,4,1
" Record9",1,4,1
" Record10",4,4,3
;
run;
Questions are weighted differently –there are three categories 1 point, 2 point, 3 point questions. I recode the questions so the responses are reflective of their point value.
Initial codeing:
1=YES
2=NO
3=Incomplete
4=N/A
Recode:
data new;
set old;
if var1=1 then Var1New= 1; else;
if var1=2 then Var1New= 0; else;
if var1=3 then Var1New=.5; else;
if var1=4 then Var1New=.;else
if var2=1 then Var2New= 2; else;
if var2=2 then Var2New= 0; else;
if var2=3 then Var2New=1; else;
if var2=4 then Var2New=.;else
if var3=1 then Var3new=3; else;
if var3=2 then Var3new=0; else;
if var3=3 then Var3new=1.5; else;
if var3=4 then Var3new=.; else;
run;
and this is where my SAS knowledge gets fuzzy (again, i am a SAS beginner). I need to award each Var a score and then aggregate them for a final score. The math should look something like this table:
| Question Point Value | Records Applicable | Highest possible Score | Points Awarded | SCORE |
VAR1 | 1 | 9 | 1X9 records=9 | 7 | 77.7% |
VAR2 | 2 | 7 | 2X7 records=14 | 5 |
35.7% |
VAR3 | 3 | 10 | 3X10 records=30 | 16.5 | 55% |
And then total score is 28.5/ 53 = TOTAL SCORE 53.7%
So, my questions are: how do you write code to do the equations and is there a way that you can recode and perform the calculations on a large group of variables at once?
I thank you in advance.
I think there is a typo in the var3 score, it should be 15, not 16.5.
data have;
infile datalines dlm=',' dsd;
input Record $ var1 var2 var3;
datalines;
"Record1", 1,2,2
"Record2 ",1,2,2
"Record3 ",1,2,2
"Record4 ",2,3,3
"Record5 ",1,1,1
"Record6",3,3,3
"Record7",1,3,3
"Record8",3,4,1
"Record9",1,4,1
"Record10",4,4,3
;
data points;
input question $ response score;
datalines;
var1 1 1
var1 2 0
var1 3 0.5
var1 4 .
var2 1 2
var2 2 0
var2 3 1
var2 4 .
var3 1 3
var3 2 0
var3 3 1.5
var3 4 .
;
proc transpose data=have out=temp name=question;
var var1-var3;
by record notsorted;
run;
proc sql;
create table questionScores as
select T.question label="Question",
count(score) as applicable, M.scoreMax*calculated applicable as highestPossible,
sum(score) as pointsAwarded, calculated pointsAwarded/calculated highestPossible as questionScore
from temp as T, points as P, (select question, max(score) as scoreMax from points group by question) as M
where T.question=P.question and T.COL1=P.response and T.question=M.question
group by T.question, M.scoreMax;
create table totalScore as
select sum(pointsAwarded) as totalAwarded, sum(highestPossible) as totalPossible,
calculated totalAwarded / calculated totalPossible as score
from questionScores;
quit;
proc sql;
title "Question scores";
select * from questionScores;
title "Total score";
select * from totalScore;
quit;
PG
One slight change to your coding. You wrote:
data new;
set old;
if var1=1 then Var1New= 1; else;
if var1=2 then Var1New= 0; else;
if var1=3 then Var1New=.5; else;
if var1=4 then Var1New=.;else
if var2=1 then Var2New= 2; else;
if var2=2 then Var2New= 0; else;
if var2=3 then Var2New=1; else;
if var2=4 then Var2New=.;else
if var3=1 then Var3new=3; else;
if var3=2 then Var3new=0; else;
if var3=3 then Var3new=1.5; else;
if var3=4 then Var3new=.; else;
run;
What would work is:
data new;
set old;
if var1=1 then Var1New= 1;
else if var1=2 then Var1New= 0;
else if var1=3 then Var1New=.5;
else if var1=4 then Var1New=.;
if var2=1 then Var2New= 2;
else if var2=2 then Var2New= 0;
else if var2=3 then Var2New=1;
else if var2=4 then Var2New=.;
if var3=1 then Var3new=3;
else if var3=2 then Var3new=0;
else if var3=3 then Var3new=1.5;
else if var3=4 then Var3new=.;
run;
Are the ten records your entire dataset or does every subject have a set of 10 records? And, if it is the latter, will there always be 10 records per subject and will they always be ordered 1 thru 10? And, if there are a number of subjects is there, or should there be, an ID field?
Thanks! In this case yes, the 10 records are the entire dataset. You could say the record variable is the ID variable for each case.
Thanks.
I think there is a typo in the var3 score, it should be 15, not 16.5.
data have;
infile datalines dlm=',' dsd;
input Record $ var1 var2 var3;
datalines;
"Record1", 1,2,2
"Record2 ",1,2,2
"Record3 ",1,2,2
"Record4 ",2,3,3
"Record5 ",1,1,1
"Record6",3,3,3
"Record7",1,3,3
"Record8",3,4,1
"Record9",1,4,1
"Record10",4,4,3
;
data points;
input question $ response score;
datalines;
var1 1 1
var1 2 0
var1 3 0.5
var1 4 .
var2 1 2
var2 2 0
var2 3 1
var2 4 .
var3 1 3
var3 2 0
var3 3 1.5
var3 4 .
;
proc transpose data=have out=temp name=question;
var var1-var3;
by record notsorted;
run;
proc sql;
create table questionScores as
select T.question label="Question",
count(score) as applicable, M.scoreMax*calculated applicable as highestPossible,
sum(score) as pointsAwarded, calculated pointsAwarded/calculated highestPossible as questionScore
from temp as T, points as P, (select question, max(score) as scoreMax from points group by question) as M
where T.question=P.question and T.COL1=P.response and T.question=M.question
group by T.question, M.scoreMax;
create table totalScore as
select sum(pointsAwarded) as totalAwarded, sum(highestPossible) as totalPossible,
calculated totalAwarded / calculated totalPossible as score
from questionScores;
quit;
proc sql;
title "Question scores";
select * from questionScores;
title "Total score";
select * from totalScore;
quit;
PG
Thanks so much! This works great.
One more small point to consider ...
It is possible that your variables contain missing values BEFORE you change their values. If you change "4" to missing, you won't be able to distinguish between values that were missing originally and values that were "4" originally. SAS provides a neat way around that. It allows special missing values, for example:
else if var1=4 then var1new=.A;
There are 27 special missing values (.A through .Z, and ._). They would all be thrown out of the calculations by PROC MEANS (or whatever you use for subsequent processing), but could be distinguished one from another.
If you refer to them in your code, the sorted order is ._ then . then .A through .Z, so you might have to change the way you examine for missing values. Instead of:
if var1new = . then ...
you would need to code:
if var1new <= .Z then ...
And instead of:
if var1new > . then ...
you would need to code:
if var1new > .Z then ...
Good luck.
Hi there--
You made a very good point regarding missing values. Right now, my office has a very poor way of handling them in or surveys. Do you have/or know of and very eay to understand literature regarding SAS special missing values I have been looking online and I can't find a lot.
Thanks!
Hmmm ... not sure where to find this online. But what I posted might just be enough. Experiment with it a bit and it will probably become clear. Try running some PROC FREQs with the MISSING option on the TABLES statement.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.