BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AJChamberlain
Obsidian | Level 7

Hi All

 

I have a dataset of exam candidates with multiple choice test answers (A-D) which looks something like this:

 

 Q1Q2Q3Q4Q5
Correct AnswerACDBC
Candidate 1CCABA
Candidate 2ACCBC
Candidate 3BABBB

 

Where 3 candidates have taken a multiple choice test, and the correct answer is in row 1, their answers in each subsequent row. I need some code to read the correct answer from row 1 for each question and return a '1' for a correct answer and a '0' for a wrong answer some something like this:

 

Output     
 Q1Q2Q3Q4Q5
Candidate 101010
Candidate 211011
Candidate 300010

I've found lots of examples of code to compare to a previous row but all my rows from 2 onwards need to compare to row one, and I need to do this for each variable.

 

Can anyone help with this?

 

Thanks

Andy

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
data want;
set have(where=(var ne 'Correct Answer'));
array qq(*) q:;
if _n_=1 then set have(where=(var eq 'Correct Answer') rename=(q1-q5=t1-t5));
array j(*) t:;
do _n_=1 to dim(j);
qq(_n_)=j(_n_)=qq(_n_);
end;
drop t:;
run;

View solution in original post

12 REPLIES 12
Astounding
PROC Star

One good way:  create a temporary array from the values in the first observation.  Then compare on later observations to the values in the array.  You cannot replace Q1-Q5 with 0/1 values.  They are character, not numeric.  So you will need to create a new set of numeric variables:

 

data want;

set have;

array correct {5} $ 1 _temporary_;

array q {5} q1-q5;

array num {5} n1-n5;

if _n_=1 then do k=1 to 5;

   correct{k} = q{k};

end;

else do k=1 to 5;

   num{k} = (correct{k} = q{k});

end;

drop k;

run;

ballardw
Super User

Use RETAIN to create variable that is kept until reset. Then only set the values on the first record.

data want;
   set have;
   retain correctQ1 - correctq5;
   if _n_=1 then do;
      correctq1 = q1;
      correctq2 = q2;
      /*follow the obvious pattern*/
   end;
   else do;
      q1 = put( (q1=correctq1),f1.);
      /* other comparisons similar*/
      output;
   end;
   drop correct: ;
run;

Of course since your Q1 to Q5 are originally character then final result, unless you jump through some hoops, will be character for final Q1 - Q5 variables. I used the put to avoid log notes about conversion from numeric to character values.

 

AJChamberlain
Obsidian | Level 7

Many thanks to you guys for your replies, I realize I should have specified since it makes a difference that my real data has numbers instead of letters so something like this:

 

 Q1Q2Q3Q4Q5
Correct Answer13423
Candidate 133121
Candidate 213323
Candidate 32122

2

 

I don't want to replace the fields with new data, I'll want to create a new dataset, with the correct answer and candidate numbers, and then after that, I'll want to calculate the percentage of answers that are correct for each question, maybe transposing the data and using Proc Freq

 

thanks

Andy

novinosrin
Tourmaline | Level 20
data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
data want;
set have;
array qq(*) q:;
array t(5) _temporary_;;
if _n_=1 then
do i=1 to dim(qq);
t(i)=qq(i);
end;
if _n_>1;
do i=1 to dim(t);
qq(i)=t(i)=qq(i);
end;
drop  i;
run;
novinosrin
Tourmaline | Level 20
data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
data want;
set have;
array qq(*) q:;
array t(5)  _temporary_;
if _n_=1 then call pokelong(peekclong(addrlong(qq(1)),40),addrlong(t(1)),40);
if _n_>1;
do i=1 to dim(t);
qq(i)=t(i)=qq(i);
end;
drop i;
run;
novinosrin
Tourmaline | Level 20
data have;
input  var & $15. (Q1	Q2	Q3	Q4	Q5) (:$1.);
cards;
Correct Answer		A	C	D	B	C
Candidate 1		C	C	A	B	A
Candidate 2		A	C	C	B	C
Candidate 3		B  	A	B	B	B
;
data want;
set have;
array qq(*) q:;
array t(5)$1 _temporary_;;
array _q(5);
if _n_=1 then
do i=1 to dim(qq);
t(i)=qq(i);
end;
if _n_>1;
do i=1 to dim(t);
_q(i)=t(i)=qq(i);
end;
drop q: i;
run;

 

SuryaKiran
Meteorite | Level 14

Try One-Many join:

proc sql;
create table want as 
select a.var,
		case when a.Q1=b.Q1 then 1 else 0 end as Q1,
		case when a.Q2=b.Q2 then 1 else 0 end as Q2,
		case when a.Q3=b.Q3 then 1 else 0 end as Q3,
		case when a.Q4=b.Q4 then 1 else 0 end as Q4,
		case when a.Q5=b.Q5 then 1 else 0 end as Q5
	from have as a,have as b
where strip(a.var)<>'Correct Answer' and strip(b.var)='Correct Answer'
;
quit;
Thanks,
Suryakiran
novinosrin
Tourmaline | Level 20

Your idea may be good, but will not handle efficiently if there are numerous questions and is pretty much hard coding. Even if you generate tons of case when with macro, the sql processor is not efficient or robust in handling wide datasets. Also, an algorithm is only valid if it can scale from  Q1 to Qn (where n should be expected to be a huge number)

 

Of course I agree WRT  thread, it seems OP has only 5 questions and it doesn't matter but a solution should handle and scale at ease. The thought of proc sql should also pave way to transpose(transform) and then an inner join and boolean, and finally retranspose will suffice, but wide*wide proc sql would be bizarre for the sql processor/optimizer IMHO.

 

data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
proc transpose data=have out=w;
by var notsorted;
var q:;
run;

proc sql;
create table w1 as
select  a.var,a._name_,a.col1=b.col1 as q
from w(where=(var ne 'Correct Answer')) a, w(where=(var eq 'Correct Answer')) b
where a._name_=b._name_
order by var, _name_;
quit;

proc transpose data=w1 out=want(drop=_name_);
by var notsorted;
var q;
run;

Either way, this is not even close the power of temporary array in datastep with a neat one pass solution

novinosrin
Tourmaline | Level 20

data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
data want;
set have(where=(var ne 'Correct Answer'));
array qq(*) q:;
if _n_=1 then set have(where=(var eq 'Correct Answer') rename=(q1-q5=t1-t5));
array j(*) t:;
do _n_=1 to dim(j);
qq(_n_)=j(_n_)=qq(_n_);
end;
drop t:;
run;
mkeintz
PROC Star

@novinosrin

 

You've been busy, with many alternatives, and I see you present the advantages of a (1) conditional SET statement.  You can also probably dispense with the "where=" options and just use "firstobs=2".

 

data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;


data want (drop=i ans:);
  if _n_=1 then set have (rename=(q1-q5=ans1-ans5));
  array ans {*} ans:;

  set have (firstobs=2);
  array q {*} q:;

  do i=1 to dim(ans);
    ncorrect=sum(ncorrect,q{i}=ans{i});
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Thanks @mkeintz  Nice one indeed. LOL Yes, I was so bored. That's typical of winter break I suppose and we at DePaul are no different. 🙂 You still awake?  I m about to doze off. Cya tomorrow

AmarnathEklare
Calcite | Level 5
Using Transpose and array we can get the result:
 
proc transpose  data=have out=want name=ID;
ID ID;
VAR Q1-Q5;
run;
data want;
set want;
i=0;
array Answer(1) Correct_Answer;
array Candidate(3) Candidate1-Candidate3;
do i = 1 to 3;
if Answer[1] = Candidate[i] then do;
Candidate[i] = 1;
end;
else do;
Candidate[i] = 0;
end;
end;
drop i;
run;
proc transpose  data=want name=ID;
ID ID;
VAR Candidate1-Candidate3;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2631 views
  • 0 likes
  • 7 in conversation