Obsidian | Level 7

## Delete duplicate ID, but when now other variables in the row match

Hello all,

I have tried to search for the solution online for hours now, but all examples seem to be geared towards deleting duplicates where the whole row, or all of the row minus one variable, match. I need to delete duplicate IDs and their corresponding rows when the other variables in the duplicate ID rows do NOT match, and keep/toss the duplicate based on the value of one specific variable, while also maintaining all my other variables.

A                 B                     C                      D

1                 1                      1                      1

1                 0                      2                      1

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5

As you can see, there are varying combinations of B for each duplicate value of A. For the B's that are all 0s or all 1s, I don't care, and just need one of them; for those that have a mix of 0s and 1s, I need to keep the 1. I also need to make sure C D and E are kept in the database, but the values those variables take don't matter - I just need whatever ones correspond to a 1 in B. In context, B is a test result (+/-) and I am trying to create a "any positive result ever" variable. I have tried every variation of dupkey, proc sort, and proc sql that I can think of.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Tourmaline | Level 20

## Re: Delete duplicate ID, but when now other variables in the row match

Will a double Proc sort suffice?

``````data have;
input A                 B                     C                      D     ;
cards;
1                 1                      1                      1

1                 0                      2                      1

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5
;
proc sort data=have out=_have;
by a descending b;
run;

proc sort data=_have out=want nodupkey;
by a;
run;``````

3 REPLIES 3
Tourmaline | Level 20

## Re: Delete duplicate ID, but when now other variables in the row match

Will a double Proc sort suffice?

``````data have;
input A                 B                     C                      D     ;
cards;
1                 1                      1                      1

1                 0                      2                      1

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5
;
proc sort data=have out=_have;
by a descending b;
run;

proc sort data=_have out=want nodupkey;
by a;
run;``````

Obsidian | Level 7

## Re: Delete duplicate ID, but when now other variables in the row match

Oh my gosh. I swear, every time I tried proc sort before, it deleted randomly and I lost positive test results. I don't know what subtle difference is in your code, but it worked! THANK YOU!!

Obsidian | Level 7

## Re: Delete duplicate ID, but when now other variables in the row match

If the extra information helps:
There are 944 observations in the dataset, 736 of which are unique IDs. There are 59 positive tests, 2 of which belong to the same ID (so 57 unique positive tests). All other duplicate IDs are all either all 0s (neg tests) or a 1 and 0.
Discussion stats
• 3 replies
• 504 views
• 0 likes
• 2 in conversation