BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chelseaxxlutz
Obsidian | Level 7

Hello all,

 

I have tried to search for the solution online for hours now, but all examples seem to be geared towards deleting duplicates where the whole row, or all of the row minus one variable, match. I need to delete duplicate IDs and their corresponding rows when the other variables in the duplicate ID rows do NOT match, and keep/toss the duplicate based on the value of one specific variable, while also maintaining all my other variables.

 

A                 B                     C                      D                     

1                 1                      1                      1          

1                 0                      2                      1                                                     

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5

 

As you can see, there are varying combinations of B for each duplicate value of A. For the B's that are all 0s or all 1s, I don't care, and just need one of them; for those that have a mix of 0s and 1s, I need to keep the 1. I also need to make sure C D and E are kept in the database, but the values those variables take don't matter - I just need whatever ones correspond to a 1 in B. In context, B is a test result (+/-) and I am trying to create a "any positive result ever" variable. I have tried every variation of dupkey, proc sort, and proc sql that I can think of. 

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Will a double Proc sort suffice?

 

data have;
input A                 B                     C                      D     ;               
cards;
1                 1                      1                      1          

1                 0                      2                      1                                                     

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5
;
proc sort data=have out=_have;
by a descending b;
run;

proc sort data=_have out=want nodupkey;
by a;
run;

 

View solution in original post

3 REPLIES 3
novinosrin
Tourmaline | Level 20

Will a double Proc sort suffice?

 

data have;
input A                 B                     C                      D     ;               
cards;
1                 1                      1                      1          

1                 0                      2                      1                                                     

2                 1                      3                      2

3                 0                      1                      3

3                 0                      5                      3

3                 1                      2                      3

4                 1                      1                      4

4                 1                      2                      4

5                 0                      3                      5

5                 0                      4                      5
;
proc sort data=have out=_have;
by a descending b;
run;

proc sort data=_have out=want nodupkey;
by a;
run;

 

chelseaxxlutz
Obsidian | Level 7

Hi @novinosrin 

Oh my gosh. I swear, every time I tried proc sort before, it deleted randomly and I lost positive test results. I don't know what subtle difference is in your code, but it worked! THANK YOU!!

 

chelseaxxlutz
Obsidian | Level 7
If the extra information helps:
There are 944 observations in the dataset, 736 of which are unique IDs. There are 59 positive tests, 2 of which belong to the same ID (so 57 unique positive tests). All other duplicate IDs are all either all 0s (neg tests) or a 1 and 0.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 578 views
  • 0 likes
  • 2 in conversation