- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I've read warning about using options of NODUP and NODUPKEY in Proc Sort. That is, both compare the previous observation written to the output data set, so if the observations that you want eliminated are not adjacent in the data set after the sort, they will not be eliminated.
I can understand it being a problem with NODUP, but why it's also a problem with NOUPKEY? Is it true that with NODUPKEY, SAS sorts the data by the variables following BY (say by ID, NUM) and eliminates any duplicates after the first occurrence of the cases. If so, I shouldn't have duplicate cases by ID and NUM, right? Here is an example.
*Records for ID#2 not adjacent and it's sex values are not the same, but I care only about duplicates by ID and NUM;
data aa; input
id sex $ NUM AMOUNT;
datalines;
1 F 3 40
1 F 4 40
2 M 3 20
3 M 3 40
3 M 3 40
2 F 3 20
;
proc print data=aa noobs; run;
proc sort data=aa out=bb nodupkey; by id num; run;
proc print data=bb noobs; run;
/*
1 F 3 40
1 F 4 40
2 M 3 20
3 M 3 40
*/
proc sort data=aa out=cc nodup; by id; run;
proc print data=cc noobs; run;
/*
1 F 3 40
1 F 4 40
2 M 3 20
3 M 3 40
2 F 3 20
*/
Thanks very much. I just wanted to be sure, as I prefer not to use PROC SQL and DISTINCT do to it.
***** BUT while I'm on it, I also have a question about SQL's create table name being the same as the source data name (as I just wanted to overwrite it), such as the following code. SAS would give the following waning, Can I ignore it?
WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.
proc sql; create table aa as select distinct id, num from aa; quit;
proc print data=aa noobs; run;
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
NODUPKEY will eliminate all of the duplicates because it is only checking the BY variables. Since the data is being sorted by those variables there is no way for identical values to be separated by some other value as can happen with NODUP.
It is generally good practice not to do that just because it will make it hard to figure out what happened when you get unexpected results.
But otherwise you only need to be concerned about the warning when your table is in an external database. That database might have rules or methods that will prevent writing back to the same table as the source.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for confirming about NODUPKEY, Much appreciated, also about SQL.