Hi All
I have a dataset of exam candidates with multiple choice test answers (A-D) which looks something like this:
Q1 | Q2 | Q3 | Q4 | Q5 | |
Correct Answer | A | C | D | B | C |
Candidate 1 | C | C | A | B | A |
Candidate 2 | A | C | C | B | C |
Candidate 3 | B | A | B | B | B |
Where 3 candidates have taken a multiple choice test, and the correct answer is in row 1, their answers in each subsequent row. I need some code to read the correct answer from row 1 for each question and return a '1' for a correct answer and a '0' for a wrong answer some something like this:
Output | |||||
Q1 | Q2 | Q3 | Q4 | Q5 | |
Candidate 1 | 0 | 1 | 0 | 1 | 0 |
Candidate 2 | 1 | 1 | 0 | 1 | 1 |
Candidate 3 | 0 | 0 | 0 | 1 | 0 |
I've found lots of examples of code to compare to a previous row but all my rows from 2 onwards need to compare to row one, and I need to do this for each variable.
Can anyone help with this?
Thanks
Andy
data have;
input var & $15. Q1 Q2 Q3 Q4 Q5 ;
cards;
Correct Answer 1 3 4 2 3
Candidate 1 3 3 1 2 1
Candidate 2 1 3 3 2 3
Candidate 3 2 1 2 2 2
;
data want;
set have(where=(var ne 'Correct Answer'));
array qq(*) q:;
if _n_=1 then set have(where=(var eq 'Correct Answer') rename=(q1-q5=t1-t5));
array j(*) t:;
do _n_=1 to dim(j);
qq(_n_)=j(_n_)=qq(_n_);
end;
drop t:;
run;
One good way: create a temporary array from the values in the first observation. Then compare on later observations to the values in the array. You cannot replace Q1-Q5 with 0/1 values. They are character, not numeric. So you will need to create a new set of numeric variables:
data want;
set have;
array correct {5} $ 1 _temporary_;
array q {5} q1-q5;
array num {5} n1-n5;
if _n_=1 then do k=1 to 5;
correct{k} = q{k};
end;
else do k=1 to 5;
num{k} = (correct{k} = q{k});
end;
drop k;
run;
Use RETAIN to create variable that is kept until reset. Then only set the values on the first record.
data want; set have; retain correctQ1 - correctq5; if _n_=1 then do; correctq1 = q1; correctq2 = q2; /*follow the obvious pattern*/ end; else do; q1 = put( (q1=correctq1),f1.); /* other comparisons similar*/ output; end; drop correct: ; run;
Of course since your Q1 to Q5 are originally character then final result, unless you jump through some hoops, will be character for final Q1 - Q5 variables. I used the put to avoid log notes about conversion from numeric to character values.
Many thanks to you guys for your replies, I realize I should have specified since it makes a difference that my real data has numbers instead of letters so something like this:
Q1 | Q2 | Q3 | Q4 | Q5 | |
Correct Answer | 1 | 3 | 4 | 2 | 3 |
Candidate 1 | 3 | 3 | 1 | 2 | 1 |
Candidate 2 | 1 | 3 | 3 | 2 | 3 |
Candidate 3 | 2 | 1 | 2 | 2 | 2 |
I don't want to replace the fields with new data, I'll want to create a new dataset, with the correct answer and candidate numbers, and then after that, I'll want to calculate the percentage of answers that are correct for each question, maybe transposing the data and using Proc Freq
thanks
Andy
data have;
input var & $15. Q1 Q2 Q3 Q4 Q5 ;
cards;
Correct Answer 1 3 4 2 3
Candidate 1 3 3 1 2 1
Candidate 2 1 3 3 2 3
Candidate 3 2 1 2 2 2
;
data want;
set have;
array qq(*) q:;
array t(5) _temporary_;;
if _n_=1 then
do i=1 to dim(qq);
t(i)=qq(i);
end;
if _n_>1;
do i=1 to dim(t);
qq(i)=t(i)=qq(i);
end;
drop i;
run;
data have;
input var & $15. Q1 Q2 Q3 Q4 Q5 ;
cards;
Correct Answer 1 3 4 2 3
Candidate 1 3 3 1 2 1
Candidate 2 1 3 3 2 3
Candidate 3 2 1 2 2 2
;
data want;
set have;
array qq(*) q:;
array t(5) _temporary_;
if _n_=1 then call pokelong(peekclong(addrlong(qq(1)),40),addrlong(t(1)),40);
if _n_>1;
do i=1 to dim(t);
qq(i)=t(i)=qq(i);
end;
drop i;
run;
data have;
input var & $15. (Q1 Q2 Q3 Q4 Q5) (:$1.);
cards;
Correct Answer A C D B C
Candidate 1 C C A B A
Candidate 2 A C C B C
Candidate 3 B A B B B
;
data want;
set have;
array qq(*) q:;
array t(5)$1 _temporary_;;
array _q(5);
if _n_=1 then
do i=1 to dim(qq);
t(i)=qq(i);
end;
if _n_>1;
do i=1 to dim(t);
_q(i)=t(i)=qq(i);
end;
drop q: i;
run;
Try One-Many join:
proc sql;
create table want as
select a.var,
case when a.Q1=b.Q1 then 1 else 0 end as Q1,
case when a.Q2=b.Q2 then 1 else 0 end as Q2,
case when a.Q3=b.Q3 then 1 else 0 end as Q3,
case when a.Q4=b.Q4 then 1 else 0 end as Q4,
case when a.Q5=b.Q5 then 1 else 0 end as Q5
from have as a,have as b
where strip(a.var)<>'Correct Answer' and strip(b.var)='Correct Answer'
;
quit;
Your idea may be good, but will not handle efficiently if there are numerous questions and is pretty much hard coding. Even if you generate tons of case when with macro, the sql processor is not efficient or robust in handling wide datasets. Also, an algorithm is only valid if it can scale from Q1 to Qn (where n should be expected to be a huge number)
Of course I agree WRT thread, it seems OP has only 5 questions and it doesn't matter but a solution should handle and scale at ease. The thought of proc sql should also pave way to transpose(transform) and then an inner join and boolean, and finally retranspose will suffice, but wide*wide proc sql would be bizarre for the sql processor/optimizer IMHO.
data have;
input var & $15. Q1 Q2 Q3 Q4 Q5 ;
cards;
Correct Answer 1 3 4 2 3
Candidate 1 3 3 1 2 1
Candidate 2 1 3 3 2 3
Candidate 3 2 1 2 2 2
;
proc transpose data=have out=w;
by var notsorted;
var q:;
run;
proc sql;
create table w1 as
select a.var,a._name_,a.col1=b.col1 as q
from w(where=(var ne 'Correct Answer')) a, w(where=(var eq 'Correct Answer')) b
where a._name_=b._name_
order by var, _name_;
quit;
proc transpose data=w1 out=want(drop=_name_);
by var notsorted;
var q;
run;
Either way, this is not even close the power of temporary array in datastep with a neat one pass solution
data have;
input var & $15. Q1 Q2 Q3 Q4 Q5 ;
cards;
Correct Answer 1 3 4 2 3
Candidate 1 3 3 1 2 1
Candidate 2 1 3 3 2 3
Candidate 3 2 1 2 2 2
;
data want;
set have(where=(var ne 'Correct Answer'));
array qq(*) q:;
if _n_=1 then set have(where=(var eq 'Correct Answer') rename=(q1-q5=t1-t5));
array j(*) t:;
do _n_=1 to dim(j);
qq(_n_)=j(_n_)=qq(_n_);
end;
drop t:;
run;
You've been busy, with many alternatives, and I see you present the advantages of a (1) conditional SET statement. You can also probably dispense with the "where=" options and just use "firstobs=2".
data have;
input var & $15. Q1 Q2 Q3 Q4 Q5 ;
cards;
Correct Answer 1 3 4 2 3
Candidate 1 3 3 1 2 1
Candidate 2 1 3 3 2 3
Candidate 3 2 1 2 2 2
;
data want (drop=i ans:);
if _n_=1 then set have (rename=(q1-q5=ans1-ans5));
array ans {*} ans:;
set have (firstobs=2);
array q {*} q:;
do i=1 to dim(ans);
ncorrect=sum(ncorrect,q{i}=ans{i});
end;
run;
Thanks @mkeintz Nice one indeed. LOL Yes, I was so bored. That's typical of winter break I suppose and we at DePaul are no different. 🙂 You still awake? I m about to doze off. Cya tomorrow
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.