BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chelseaxxlutz
Obsidian | Level 7

Hello all,

 

I have tried to search for the solution online for hours now, but all examples seem to be geared towards deleting duplicates where the whole row, or all of the row minus one variable, match. I need to delete duplicate IDs and their corresponding rows when the other variables in the duplicate ID rows do NOT match, and keep/toss the duplicate based on the value of one specific variable, while also maintaining all my other variables.

 

A                 B                     C                      D                     

1                 1                      1                      1          

1                 0                      2                      1                                                     

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5

 

As you can see, there are varying combinations of B for each duplicate value of A. For the B's that are all 0s or all 1s, I don't care, and just need one of them; for those that have a mix of 0s and 1s, I need to keep the 1. I also need to make sure C D and E are kept in the database, but the values those variables take don't matter - I just need whatever ones correspond to a 1 in B. In context, B is a test result (+/-) and I am trying to create a "any positive result ever" variable. I have tried every variation of dupkey, proc sort, and proc sql that I can think of. 

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Will a double Proc sort suffice?

 

data have;
input A                 B                     C                      D     ;               
cards;
1                 1                      1                      1          

1                 0                      2                      1                                                     

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5
;
proc sort data=have out=_have;
by a descending b;
run;

proc sort data=_have out=want nodupkey;
by a;
run;

 

View solution in original post

3 REPLIES 3
novinosrin
Tourmaline | Level 20

Will a double Proc sort suffice?

 

data have;
input A                 B                     C                      D     ;               
cards;
1                 1                      1                      1          

1                 0                      2                      1                                                     

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5
;
proc sort data=have out=_have;
by a descending b;
run;

proc sort data=_have out=want nodupkey;
by a;
run;

 

chelseaxxlutz
Obsidian | Level 7

Hi @novinosrin 

Oh my gosh. I swear, every time I tried proc sort before, it deleted randomly and I lost positive test results. I don't know what subtle difference is in your code, but it worked! THANK YOU!!

 

chelseaxxlutz
Obsidian | Level 7
If the extra information helps:
There are 944 observations in the dataset, 736 of which are unique IDs. There are 59 positive tests, 2 of which belong to the same ID (so 57 unique positive tests). All other duplicate IDs are all either all 0s (neg tests) or a 1 and 0.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 852 views
  • 0 likes
  • 2 in conversation