I want to dedupe the duplicate codes based on their Score, the highest score stays followed by sorting high to low of score. But, if the score is the same & reason codes are different, I want to maintain the order of reason codes. The dedupe only happens across one id, ID - 1 would *not* look for duplicates in ID - 2
Example -
HAVE -
data have;
input id name $ score code $;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07
;
run;
ID | NAME | SCORE | CODE |
1 | John | 23 | A05 |
1 | John | 26 | A02 |
1 | John | 26 | A03 |
1 | John | 26 | A01 |
2 | Mark | 21 | A05 |
2 | Mark | 22 | A05 |
2 | Mark | 23 | A07 |
WANT -
ID | NAME | SCORE | CODE |
1 | John | 26 | A02 [ Same Order as HAVE ] |
1 | John | 26 | A03 |
1 | John | 26 | A01 |
1 | John | 23 | A05 |
2 | Mark | 23 | A07 |
2 | Mark | 22 | A05 |
If the original order is important then add a variables to indicate the original order.
You could do it when you create the dataset.
data have;
input id name $ score code $;
row+1;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07
;
Or add it later. Perhaps as a VIEW?
data have_order / view=have_order ;
set have;
row+1;
run;
Now sort by ID NAME CODE and descending SCORE. Then you can de-dupe per ID NAME CODE. Then you can re-sort by the ID NAME descending SCORE ROW to get the codes in the original order.
proc sort data=have out=step1 ;
by id name code descending score ;
run;
proc sort data=step1 out=step2 nodupkey;
by id name code ;
run;
proc sort data=step2 out=want;
by id name descending score row ;
run;
Obs id name score code row 1 1 John 26 A02 2 2 1 John 26 A03 3 3 1 John 26 A01 4 4 1 John 23 A05 1 5 2 Mark 23 A07 7 6 2 Mark 22 A05 6
Not sure I get exactly what you want to do.
But if sort order is crucial for you application, you should have a unique (row?) id column in your data set.
Then do de-duplication, and then to be sure sort it by the earlier assigned id.
Hi @shubham_d
I think this gets you most of the way there. Could you expand on the requirements for the custom sort a bit more? You're only applying a sort on part of the data
*Read data in;
data have;
input id name $ score code $;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07
;
run;
*Create key variable;
data want;
set have;
key = name || code;
run;
*Sort by key;
proc sort data=want ; by key;run;
*Only retain first key (i.e. highest score);
data want;
set want;
by key;
if first.key then output;
drop key;
run;
*Sort output data;
proc sort data=want ; by name descending score;run;
Thanks
Harry
Hello @shubham_d,
Here's another suggestion: First, sort by ID and descending score, then remove duplicates.
proc sort data=have out=temp;
by id descending score;
run;
data want(drop=_:);
array _c[999] $; /* increase dimension if >999 obs. per ID */
do _i=1 by 1 until(last.id);
set temp;
by id;
if code ~in _c then do;
output;
_c[_i]=code;
end;
end;
run;
If performance is an issue, you may want to use a _temporary_ array or a hash object or at least adapt dimension and variable length of array _c.
I think I'm understanding your logic. You're deduplicating by ID - Code, right? And then you want to just sort the data? Agree with @LinusH about adding a temporary row variable to help with restoring the sort order.
data have;
input id name $ score code $;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07
;
run;
*add a row counter, so can restore sort order at the end ;
data want1 ;
set have ;
row++1 ;
run ;
*Deduplicate by id-code. ;
*Select the code with the maximum score ;
proc sort data=want1 ;
by id code descending score ;
run ;
data want ;
set want1 ;
by id code descending score ;
if first.code ;
run ;
*sort to desired order;
proc sort data=want ;
by id descending score row ;
run ;
proc print data=want;
run ;
If the original order is important then add a variables to indicate the original order.
You could do it when you create the dataset.
data have;
input id name $ score code $;
row+1;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07
;
Or add it later. Perhaps as a VIEW?
data have_order / view=have_order ;
set have;
row+1;
run;
Now sort by ID NAME CODE and descending SCORE. Then you can de-dupe per ID NAME CODE. Then you can re-sort by the ID NAME descending SCORE ROW to get the codes in the original order.
proc sort data=have out=step1 ;
by id name code descending score ;
run;
proc sort data=step1 out=step2 nodupkey;
by id name code ;
run;
proc sort data=step2 out=want;
by id name descending score row ;
run;
Obs id name score code row 1 1 John 26 A02 2 2 1 John 26 A03 3 3 1 John 26 A01 4 4 1 John 23 A05 1 5 2 Mark 23 A07 7 6 2 Mark 22 A05 6
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.