BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nmp
Fluorite | Level 6 nmp
Fluorite | Level 6

 I am apparently having some trouble understanding NODUP. If I run the two following lines of code:

 

proc sort data= have nodup out = ck1 ;by _all_; run;


proc sort data= have nodup out = ck2 ;by person_id; run;

 

I get different results: the first line deletes more duplicate records that the second one. As I understood nodup, it should delete records where all variables are the same, and sort by the by variables. So the sort order might be different, but the deleted records should be the same.

 

Can someone help me understand what is happening? Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The NODUP option only eliminates duplicate rows that just by happenstance end up being ready for output adjacent to each other.

Example:

PersonID | Age | Wt
1 | 10 | 100
1 | 12 | 200
1 | 10 | 100

None of the records are deleted since each is different from the one next to it.

 

When you sort by ALL of the variables you eliminate all of the duplicate records since duplicate records are guaranteed to end up next to each other, but if you only sort by PERSON_ID then duplicate records could still get output.

 

 

If you just want one record per PERSON_ID then use the NODUPKEY option instead.

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

The NODUP option only eliminates duplicate rows that just by happenstance end up being ready for output adjacent to each other.

Example:

PersonID | Age | Wt
1 | 10 | 100
1 | 12 | 200
1 | 10 | 100

None of the records are deleted since each is different from the one next to it.

 

When you sort by ALL of the variables you eliminate all of the duplicate records since duplicate records are guaranteed to end up next to each other, but if you only sort by PERSON_ID then duplicate records could still get output.

 

 

If you just want one record per PERSON_ID then use the NODUPKEY option instead.

nmp
Fluorite | Level 6 nmp
Fluorite | Level 6

Great, thank you for your prompt and clear reply, I did know the NODUPKEY, but clearly did not understand the process by which NODUP eliminates records. Now I do.

 

Thanks.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1885 views
  • 3 likes
  • 2 in conversation