I am apparently having some trouble understanding NODUP. If I run the two following lines of code:
proc sort data= have nodup out = ck1 ;by _all_; run;
proc sort data= have nodup out = ck2 ;by person_id; run;
I get different results: the first line deletes more duplicate records that the second one. As I understood nodup, it should delete records where all variables are the same, and sort by the by variables. So the sort order might be different, but the deleted records should be the same.
Can someone help me understand what is happening? Thank you!
The NODUP option only eliminates duplicate rows that just by happenstance end up being ready for output adjacent to each other.
Example:
PersonID | Age | Wt
1 | 10 | 100
1 | 12 | 200
1 | 10 | 100
None of the records are deleted since each is different from the one next to it.
When you sort by ALL of the variables you eliminate all of the duplicate records since duplicate records are guaranteed to end up next to each other, but if you only sort by PERSON_ID then duplicate records could still get output.
If you just want one record per PERSON_ID then use the NODUPKEY option instead.
The NODUP option only eliminates duplicate rows that just by happenstance end up being ready for output adjacent to each other.
Example:
PersonID | Age | Wt
1 | 10 | 100
1 | 12 | 200
1 | 10 | 100
None of the records are deleted since each is different from the one next to it.
When you sort by ALL of the variables you eliminate all of the duplicate records since duplicate records are guaranteed to end up next to each other, but if you only sort by PERSON_ID then duplicate records could still get output.
If you just want one record per PERSON_ID then use the NODUPKEY option instead.
Great, thank you for your prompt and clear reply, I did know the NODUPKEY, but clearly did not understand the process by which NODUP eliminates records. Now I do.
Thanks.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.