Solved: Delete duplicate ID, but when now other variables in the row match

chelseaxxlutz · Posted 04-24-2020 02:33 PM

Hello all,

I have tried to search for the solution online for hours now, but all examples seem to be geared towards deleting duplicates where the whole row, or all of the row minus one variable, match. I need to delete duplicate IDs and their corresponding rows when the other variables in the duplicate ID rows do NOT match, and keep/toss the duplicate based on the value of one specific variable, while also maintaining all my other variables.

A B C D

1 1 1 1

1 0 2 1

2 1 3 2

3 0 1 3

3 0 5 3

3 1 2 3

4 1 1 4

4 1 2 4

5 0 3 5

5 0 4 5

As you can see, there are varying combinations of B for each duplicate value of A. For the B's that are all 0s or all 1s, I don't care, and just need one of them; for those that have a mix of 0s and 1s, I need to keep the 1. I also need to make sure C D and E are kept in the database, but the values those variables take don't matter - I just need whatever ones correspond to a 1 in B. In context, B is a test result (+/-) and I am trying to create a "any positive result ever" variable. I have tried every variation of dupkey, proc sort, and proc sql that I can think of.

Thank you!

novinosrin · Posted 04-24-2020 03:06 PM

Will a double Proc sort suffice?

data have;
input A                 B                     C                      D     ;               
cards;
1                 1                      1                      1          

1                 0                      2                      1                                                     

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5
;
proc sort data=have out=_have;
by a descending b;
run;

proc sort data=_have out=want nodupkey;
by a;
run;

View solution in original post

novinosrin · Posted 04-24-2020 03:06 PM

Will a double Proc sort suffice?

data have;
input A                 B                     C                      D     ;               
cards;
1                 1                      1                      1          

1                 0                      2                      1                                                     

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5
;
proc sort data=have out=_have;
by a descending b;
run;

proc sort data=_have out=want nodupkey;
by a;
run;

chelseaxxlutz · Posted 04-24-2020 03:17 PM

Hi @novinosrin

Oh my gosh. I swear, every time I tried proc sort before, it deleted randomly and I lost positive test results. I don't know what subtle difference is in your code, but it worked! THANK YOU!!

chelseaxxlutz · Posted 04-24-2020 03:12 PM

If the extra information helps:
There are 944 observations in the dataset, 736 of which are unique IDs. There are 59 positive tests, 2 of which belong to the same ID (so 57 unique positive tests). All other duplicate IDs are all either all 0s (neg tests) or a 1 and 0.

Delete duplicate ID, but when now other variables in the row match

Re: Delete duplicate ID, but when now other variables in the row match

Re: Delete duplicate ID, but when now other variables in the row match

Re: Delete duplicate ID, but when now other variables in the row match

Re: Delete duplicate ID, but when now other variables in the row match

Delete duplicate ID, but when now other variables in the row match

Re: Delete duplicate ID, but when now other variables in the row match

Re: Delete duplicate ID, but when now other variables in the row match

Re: Delete duplicate ID, but when now other variables in the row match

Re: Delete duplicate ID, but when now other variables in the row match

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!