Table 1
Name | Question | Score |
---|---|---|
Ada | A | 98 |
Ada | B | 100 |
Tony | A | 76 |
Tony | B | 60 |
Kelly | A | 81 |
Kelly | B | 90 |
Table 2
Name | Question | Score | Flag |
---|---|---|---|
Ada | A | 98 | |
Ada | B | 100 | Yes |
Tony | A | 76 | |
Tony | B | 60 | |
Kelly | A | 81 | |
Kelly | B | 90 | Yes |
I'm trying to flag question B if the person has a higher score on question B than A.
For example, table 1 shows that Ada and Kelly both have higher scores on question B than A, so I want create a table like table 2 with flagging question B where score is higher than question A. Can someone please help me how to code this?
In that case, 2XDOW will help,
data want;
do until (last.name);
set have;
by name;
if question='A' then A_score=score;
if question='B' then B_score=score;
end;
do until (last.name);
set have;
by name;
if if question='B' and B_score > A_score then flag='Yes'; else flag ='No';
output;
end;
drop A_score B_score;
run;
So by doing this, you will be able to compare any two specific scores. Let me know if I misunderstand your need.
Is the order of the observations reliable? That would mean you would always have data for both questions for each NAME, and question B always follows immediately after question A.
Under those conditions, you could code:
data want;
set have;
increase = dif(score);
if question='B' and increase > 0 then flag='Yes';
run;
If there might be other variations that occur in the data, it would still be possible. But it would take a little more programming.
if tie is considered as 'Yes', here is an option:
proc sql;
create table want as
select *, case when question='B' and score=max(score) then 'Yes' else ' ' end as flag
from have
group by name
;
quit;
Thank you for your help.
But what if the dataset has more than two questions to compare. For example, I have question A, B, C, and D, and I want to compare the score of A and C and flag the one with higher score. The part that I'm stuck is I don't know how to choose the two scores from two specific questions. My dataset has more than 100 questions and I need to compare two scores from two specific questions.
In that case, 2XDOW will help,
data want;
do until (last.name);
set have;
by name;
if question='A' then A_score=score;
if question='B' then B_score=score;
end;
do until (last.name);
set have;
by name;
if if question='B' and B_score > A_score then flag='Yes'; else flag ='No';
output;
end;
drop A_score B_score;
run;
So by doing this, you will be able to compare any two specific scores. Let me know if I misunderstand your need.
It works!!!!!! You are awesome. Thank you!
Seems like this can only compare two scores for one name, which mean I can only compare question A & B, not A & B and C & D for the same person. In other words, if I want to compare question A & B and question C & D for Ada then it won't work.
I got lost here. What is your ultimate purpose? comparing a specific pair of answers or comparing any pair of answers? if the latter, then what is the difference from flagging the maximum score out of all of the answer?
I'm sorry that I confused you.
I have multiple pairs of answers that I need to compare per name, such as A compares to B and flag B(the higher score), and C compares to D and flag D (higher score). Not intended to compare A, B , C, D all together. I hope that I clarified.
You need to specify how many matched groups are there .
data have;
infile cards truncover expandtabs;
input Name $ Question $ Score;
cards;
Ada A 98
Ada B 100
Ada C 98
Ada D 100
Tony A 76
Tony B 60
Kelly A 81
Kelly B 90
;
run;
data temp;
set have;
if Question in ('A' 'B') then group=1;
else if Question in ('C' 'D') then group=2;
run;
proc sort data=temp;by Name group;run;
data want;
merge temp temp(rename=(Score=_Score Question=_Q) where=(_Q in ('A' 'C')));
by Name group;
if Score gt _Score then flag='Y';
drop _Score _Q group ;
run;
Xia Keshan
You will have several options.
1. One pair a time. Make a Macro out of the code I posted, use answers as input parameters for the macro.
2. All pairs at one time. Then like @xia keshan mentioned, you need to know the total number of the pairs, then create multiple flag variables and temp score variables for each pair respectively.
what if there are more than two questions (ex. question A, B, C, D)? I want to compare question A and question C which has the higher score and flag the row with higher score. It seems like your solution work only if there are two questions. How to select the particular questions to compare the score is my major issue. Thank you in advance.
This may be "old school," but I would use a PROC TRANSPOSE to transpose the different score rows to columns with a resulting file having one row per individual. Columns for SCOREA, SCOREB, SCOREC, etc. The you can use a MAX function to get the highest of the SCORE* values. Does that make sense?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.