Solved: Compare the value of an observation in row 1 to the value in subsequen...

AJChamberlain · Posted 12-07-2018 10:09 AM

Hi All

I have a dataset of exam candidates with multiple choice test answers (A-D) which looks something like this:

	Q1	Q2	Q3	Q4	Q5
Correct Answer	A	C	D	B	C
Candidate 1	C	C	A	B	A
Candidate 2	A	C	C	B	C
Candidate 3	B	A	B	B	B

Where 3 candidates have taken a multiple choice test, and the correct answer is in row 1, their answers in each subsequent row. I need some code to read the correct answer from row 1 for each question and return a '1' for a correct answer and a '0' for a wrong answer some something like this:

Output
	Q1	Q2	Q3	Q4	Q5
Candidate 1	0	1	0	1	0
Candidate 2	1	1	0	1	1
Candidate 3	0	0	0	1	0

I've found lots of examples of code to compare to a previous row but all my rows from 2 onwards need to compare to row one, and I need to do this for each variable.

Can anyone help with this?

Thanks

Andy

novinosrin · Posted 12-07-2018 12:31 PM


data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
data want;
set have(where=(var ne 'Correct Answer'));
array qq(*) q:;
if _n_=1 then set have(where=(var eq 'Correct Answer') rename=(q1-q5=t1-t5));
array j(*) t:;
do _n_=1 to dim(j);
qq(_n_)=j(_n_)=qq(_n_);
end;
drop t:;
run;

View solution in original post

Astounding · Posted 12-07-2018 10:21 AM

One good way: create a temporary array from the values in the first observation. Then compare on later observations to the values in the array. You cannot replace Q1-Q5 with 0/1 values. They are character, not numeric. So you will need to create a new set of numeric variables:

data want;

set have;

array correct {5} $ 1 _temporary_;

array q {5} q1-q5;

array num {5} n1-n5;

if _n_=1 then do k=1 to 5;

correct{k} = q{k};

end;

else do k=1 to 5;

num{k} = (correct{k} = q{k});

end;

drop k;

run;

ballardw · Posted 12-07-2018 10:21 AM

Use RETAIN to create variable that is kept until reset. Then only set the values on the first record.

data want;
   set have;
   retain correctQ1 - correctq5;
   if _n_=1 then do;
      correctq1 = q1;
      correctq2 = q2;
      /*follow the obvious pattern*/
   end;
   else do;
      q1 = put( (q1=correctq1),f1.);
      /* other comparisons similar*/
      output;
   end;
   drop correct: ;
run;

Of course since your Q1 to Q5 are originally character then final result, unless you jump through some hoops, will be character for final Q1 - Q5 variables. I used the put to avoid log notes about conversion from numeric to character values.

AJChamberlain · Posted 12-07-2018 10:27 AM

Many thanks to you guys for your replies, I realize I should have specified since it makes a difference that my real data has numbers instead of letters so something like this:

	Q1	Q2	Q3	Q4	Q5
Correct Answer	1	3	4	2	3
Candidate 1	3	3	1	2	1
Candidate 2	1	3	3	2	3
Candidate 3	2	1	2	2	2

I don't want to replace the fields with new data, I'll want to create a new dataset, with the correct answer and candidate numbers, and then after that, I'll want to calculate the percentage of answers that are correct for each question, maybe transposing the data and using Proc Freq

thanks

Andy

novinosrin · Posted 12-07-2018 10:33 AM

data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
data want;
set have;
array qq(*) q:;
array t(5) _temporary_;;
if _n_=1 then
do i=1 to dim(qq);
t(i)=qq(i);
end;
if _n_>1;
do i=1 to dim(t);
qq(i)=t(i)=qq(i);
end;
drop  i;
run;

novinosrin · Posted 12-07-2018 10:37 AM

data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
data want;
set have;
array qq(*) q:;
array t(5)  _temporary_;
if _n_=1 then call pokelong(peekclong(addrlong(qq(1)),40),addrlong(t(1)),40);
if _n_>1;
do i=1 to dim(t);
qq(i)=t(i)=qq(i);
end;
drop i;
run;

novinosrin · Posted 12-07-2018 10:25 AM

data have;
input  var & $15. (Q1	Q2	Q3	Q4	Q5) (:$1.);
cards;
Correct Answer		A	C	D	B	C
Candidate 1		C	C	A	B	A
Candidate 2		A	C	C	B	C
Candidate 3		B  	A	B	B	B
;
data want;
set have;
array qq(*) q:;
array t(5)$1 _temporary_;;
array _q(5);
if _n_=1 then
do i=1 to dim(qq);
t(i)=qq(i);
end;
if _n_>1;
do i=1 to dim(t);
_q(i)=t(i)=qq(i);
end;
drop q: i;
run;

SuryaKiran · Posted 12-07-2018 11:15 AM

Try One-Many join:

proc sql;
create table want as 
select a.var,
		case when a.Q1=b.Q1 then 1 else 0 end as Q1,
		case when a.Q2=b.Q2 then 1 else 0 end as Q2,
		case when a.Q3=b.Q3 then 1 else 0 end as Q3,
		case when a.Q4=b.Q4 then 1 else 0 end as Q4,
		case when a.Q5=b.Q5 then 1 else 0 end as Q5
	from have as a,have as b
where strip(a.var)<>'Correct Answer' and strip(b.var)='Correct Answer'
;
quit;

Thanks,
Suryakiran

novinosrin · Posted 12-07-2018 11:48 AM

Your idea may be good, but will not handle efficiently if there are numerous questions and is pretty much hard coding. Even if you generate tons of case when with macro, the sql processor is not efficient or robust in handling wide datasets. Also, an algorithm is only valid if it can scale from Q1 to Qn (where n should be expected to be a huge number)

Of course I agree WRT thread, it seems OP has only 5 questions and it doesn't matter but a solution should handle and scale at ease. The thought of proc sql should also pave way to transpose(transform) and then an inner join and boolean, and finally retranspose will suffice, but wide*wide proc sql would be bizarre for the sql processor/optimizer IMHO.

data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
proc transpose data=have out=w;
by var notsorted;
var q:;
run;

proc sql;
create table w1 as
select  a.var,a._name_,a.col1=b.col1 as q
from w(where=(var ne 'Correct Answer')) a, w(where=(var eq 'Correct Answer')) b
where a._name_=b._name_
order by var, _name_;
quit;

proc transpose data=w1 out=want(drop=_name_);
by var notsorted;
var q;
run;

Either way, this is not even close the power of temporary array in datastep with a neat one pass solution

novinosrin · Posted 12-07-2018 12:31 PM


data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;
data want;
set have(where=(var ne 'Correct Answer'));
array qq(*) q:;
if _n_=1 then set have(where=(var eq 'Correct Answer') rename=(q1-q5=t1-t5));
array j(*) t:;
do _n_=1 to dim(j);
qq(_n_)=j(_n_)=qq(_n_);
end;
drop t:;
run;

mkeintz · Posted 12-07-2018 11:51 PM

@novinosrin

You've been busy, with many alternatives, and I see you present the advantages of a (1) conditional SET statement. You can also probably dispense with the "where=" options and just use "firstobs=2".

data have;
input  var & $15. Q1	Q2	Q3	Q4	Q5 ;
cards;
Correct Answer	1	3	4	2	3
Candidate 1		3	3	1	2	1
Candidate 2		1	3	3	2	3
Candidate 3		2	1	2	2	2
;


data want (drop=i ans:);
  if _n_=1 then set have (rename=(q1-q5=ans1-ans5));
  array ans {*} ans:;

  set have (firstobs=2);
  array q {*} q:;

  do i=1 to dim(ans);
    ncorrect=sum(ncorrect,q{i}=ans{i});
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 12-08-2018 12:02 AM

Thanks @mkeintz Nice one indeed. LOL Yes, I was so bored. That's typical of winter break I suppose and we at DePaul are no different. 🙂 You still awake? I m about to doze off. Cya tomorrow

AmarnathEklare · Posted 12-07-2018 11:01 PM

Using Transpose and array we can get the result:

proc transpose data=have out=want name=ID;
ID ID;
VAR Q1-Q5;
run;

data want;
set want;
i=0;
array Answer(1) Correct_Answer;
array Candidate(3) Candidate1-Candidate3;
do i = 1 to 3;
if Answer[1] = Candidate[i] then do;
Candidate[i] = 1;
end;
else do;
Candidate[i] = 0;
end;
end;
drop i;
run;

proc transpose data=want name=ID;
ID ID;
VAR Candidate1-Candidate3;
run;

Compare the value of an observation in row 1 to the value in subsequent rows for the same variable

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Re: Compare the value of an observation in row 1 to the value in subsequent rows for the same variab

Classroom Training Available!