BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
veda8
Fluorite | Level 6

do we have to always sort by _ALL_ to use noduprecs?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@veda8 wrote:

do we have to always sort by _ALL_ to use noduprecs?

 


You don't "have" to, SAS will happily let you use any subset of the variables in the BY statement.

But if you want the result to eliminate all duplicate records you do.

The reason is that the DUP check (or as it has been renamed the DUPRECS check) only compares adjacent records.  So if you only sort by a subset of the variables then it is possible for two records that are exactly the same to be output.  They just need at least one observation that is different on some non-key (by) variable in between them.

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

No:

 

data have;
input ID var;
datalines;
1 10
1 20
1 10
3 50
3 50
3 50
2 30
2 30
2 40
;

proc sort data=have noduprec;
   by ID;
run;
veda8
Fluorite | Level 6

when use nodupkey and give two variables in dupkey 

eg :

by id var;

which var(s) is considered as dupkey?

ballardw
Super User

@veda8 wrote:

when use nodupkey and give two variables in dupkey 

eg :

by id var;

which var(s) is considered as dupkey?


Both.

The "key" is whatever is on the By statement.

 

NODUPKEY

checks for and eliminates observations with duplicate BY values. If you specify this option, then PROC SORT compares all BY values for each observation to the ones for the previous observation that is written to the output data set. If an exact match is found, then the observation is not written to the output data set.

Reeza
Super User
You have to use a double sort with NODUPRECS as well. It isn't supported any longer and you shouldn't use it in production code going forward. Use NODUPKEY instead. SAS takes a while to deprecate features but it has been removed from the documentation and is currently maintained for backwards compatibility.

https://documentation.sas.com/?docsetId=proc&docsetVersion=9.4&docsetTarget=p02bhn81rn4u64n1b6l00ftd...
Ksharp
Super User
I think it is Yes.
And better NOT use noduprecs , try nodupkey + _all_ instead.
Tom
Super User Tom
Super User

@veda8 wrote:

do we have to always sort by _ALL_ to use noduprecs?

 


You don't "have" to, SAS will happily let you use any subset of the variables in the BY statement.

But if you want the result to eliminate all duplicate records you do.

The reason is that the DUP check (or as it has been renamed the DUPRECS check) only compares adjacent records.  So if you only sort by a subset of the variables then it is possible for two records that are exactly the same to be output.  They just need at least one observation that is different on some non-key (by) variable in between them.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2520 views
  • 4 likes
  • 6 in conversation