BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

I've read warning about using options of NODUP and NODUPKEY in Proc Sort. That is, both compare the previous observation written to the output data set, so if the observations that you want eliminated are not adjacent in the data set after the sort, they will not be eliminated.

I can understand it being a problem with NODUP, but why it's also a problem with NOUPKEY? Is it true that with NODUPKEY, SAS sorts the data by the variables following BY (say by ID, NUM) and eliminates any duplicates after the first occurrence of the cases. If so, I shouldn't have duplicate cases by ID and NUM, right? Here is an example.

*Records for ID#2 not adjacent and it's sex values are not the same, but I care only about duplicates by ID and NUM;

data aa; input

id sex $ NUM AMOUNT;

datalines;

1 F 3 40

1 F 4 40

2 M 3 20

3 M 3 40

3 M 3 40

2 F 3 20

;

proc print data=aa noobs; run;

proc sort data=aa out=bb nodupkey; by id num; run;

proc print data=bb noobs; run;

/*

1 F 3 40

1 F 4 40

2 M 3 20

3 M 3 40

*/

proc sort data=aa out=cc nodup; by id; run;

proc print data=cc noobs; run;

/*

1 F 3 40

1 F 4 40

2 M 3 20

3 M 3 40

2 F 3 20

*/

Thanks very much. I just wanted to be sure, as I prefer not to use PROC SQL and DISTINCT do to it.

***** BUT while I'm on it, I also have a question about SQL's create table name being the same as the source data name (as I just wanted to overwrite it), such as the following code. SAS would give the following waning, Can I ignore it?

      WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.

proc sql; create table aa as select distinct id, num from aa; quit;

proc print data=aa noobs; run;

Thanks,

2 REPLIES 2
Tom
Super User Tom
Super User

NODUPKEY will eliminate all of the duplicates because it is only checking the BY variables.  Since the data is being sorted by those variables there is no way for identical values to be separated by some other value as can happen with NODUP.

It is generally good practice not to do that just because it will make it hard to figure out what happened when you get unexpected results.

But otherwise you only need to be concerned about the warning when your table is in an external database.  That database might have rules or methods that will prevent writing back to the same table as the source.

Solph
Pyrite | Level 9

Thanks for confirming about NODUPKEY, Much appreciated, also about SQL.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1064 views
  • 0 likes
  • 2 in conversation