Solved: Re: How to flag a row by comparing different records?

ernie86 · Posted 07-22-2015 06:16 PM

Table 1

Name	Question	Score
Ada	A	98
Ada	B	100
Tony	A	76
Tony	B	60
Kelly	A	81
Kelly	B	90

Table 2

Name	Question	Score	Flag
Ada	A	98
Ada	B	100	Yes
Tony	A	76
Tony	B	60
Kelly	A	81
Kelly	B	90	Yes

I'm trying to flag question B if the person has a higher score on question B than A.

For example, table 1 shows that Ada and Kelly both have higher scores on question B than A, so I want create a table like table 2 with flagging question B where score is higher than question A. Can someone please help me how to code this?

Haikuo · Posted 07-23-2015 11:41 AM

In that case, 2XDOW will help,

data want;

do until (last.name);

set have;

by name;

if question='A' then A_score=score;

if question='B' then B_score=score;

end;

do until (last.name);

set have;

by name;

if if question='B' and B_score > A_score then flag='Yes'; else flag ='No';

output;

end;

drop A_score B_score;

run;

So by doing this, you will be able to compare any two specific scores. Let me know if I misunderstand your need.

View solution in original post

Astounding · Posted 07-22-2015 09:20 PM

Is the order of the observations reliable? That would mean you would always have data for both questions for each NAME, and question B always follows immediately after question A.

Under those conditions, you could code:

data want;

set have;

increase = dif(score);

if question='B' and increase > 0 then flag='Yes';

run;

If there might be other variations that occur in the data, it would still be possible. But it would take a little more programming.

Haikuo · Posted 07-22-2015 09:32 PM

if tie is considered as 'Yes', here is an option:

proc sql;

create table want as

select *, case when question='B' and score=max(score) then 'Yes' else ' ' end as flag

from have

group by name

;

quit;

ernie86 · Posted 07-23-2015 10:21 AM

Thank you for your help.

But what if the dataset has more than two questions to compare. For example, I have question A, B, C, and D, and I want to compare the score of A and C and flag the one with higher score. The part that I'm stuck is I don't know how to choose the two scores from two specific questions. My dataset has more than 100 questions and I need to compare two scores from two specific questions.

Haikuo · Posted 07-23-2015 11:41 AM

In that case, 2XDOW will help,

data want;

do until (last.name);

set have;

by name;

if question='A' then A_score=score;

if question='B' then B_score=score;

end;

do until (last.name);

set have;

by name;

if if question='B' and B_score > A_score then flag='Yes'; else flag ='No';

output;

end;

drop A_score B_score;

run;

So by doing this, you will be able to compare any two specific scores. Let me know if I misunderstand your need.

ernie86 · Posted 07-23-2015 01:29 PM

It works!!!!!! You are awesome. Thank you!

ernie86 · Posted 07-23-2015 04:53 PM

Seems like this can only compare two scores for one name, which mean I can only compare question A & B, not A & B and C & D for the same person. In other words, if I want to compare question A & B and question C & D for Ada then it won't work.

Haikuo · Posted 07-23-2015 08:39 PM

I got lost here. What is your ultimate purpose? comparing a specific pair of answers or comparing any pair of answers? if the latter, then what is the difference from flagging the maximum score out of all of the answer?

ernie86 · Posted 07-23-2015 10:15 PM

I'm sorry that I confused you.

I have multiple pairs of answers that I need to compare per name, such as A compares to B and flag B(the higher score), and C compares to D and flag D (higher score). Not intended to compare A, B , C, D all together. I hope that I clarified.

Ksharp · Posted 07-24-2015 08:08 AM

You need to specify how many matched groups are there .

Code: Program.sas

data have;
infile cards truncover expandtabs;
input Name $ Question $ Score;
cards;
Ada A 98
Ada B 100
Ada C 98
Ada D 100
Tony A 76
Tony B 60
Kelly A 81
Kelly B 90
;
run;

data temp;
 set have;
 if Question in ('A' 'B') then group=1;
  else if Question in ('C' 'D') then group=2;
run;
proc sort data=temp;by Name group;run;
data want;
 merge temp temp(rename=(Score=_Score Question=_Q) where=(_Q in ('A' 'C')));
 by Name group;
 if Score gt _Score then flag='Y';
 drop _Score _Q group ;
run;

Xia Keshan

Haikuo · Posted 07-24-2015 12:20 PM

You will have several options.

1. One pair a time. Make a Macro out of the code I posted, use answers as input parameters for the macro.

2. All pairs at one time. Then like @xia keshan mentioned, you need to know the total number of the pairs, then create multiple flag variables and temp score variables for each pair respectively.

ernie86 · Posted 07-23-2015 10:11 AM

what if there are more than two questions (ex. question A, B, C, D)? I want to compare question A and question C which has the higher score and flag the row with higher score. It seems like your solution work only if there are two questions. How to select the particular questions to compare the score is my major issue. Thank you in advance.

Brent_PLNU · Posted 07-23-2015 01:18 PM

This may be "old school," but I would use a PROC TRANSPOSE to transpose the different score rows to columns with a resulting file having one row per individual. Columns for SCOREA, SCOREB, SCOREC, etc. The you can use a MAX function to get the highest of the SCORE* values. Does that make sense?

Code: Program.sas

Registration is open

SAS Training: Just a Click Away