topic Re: save the duplicates in SAS Programming

save the duplicates

Smitha9 — Fri, 14 Oct 2022 17:40:31 GMT

Hi,

I have a dataset

ID vin zip

11221 7896 43567

11221 7654 43567

13456 5433 41323

16754 6432 51678

I want to remove the duplicates and want to save what I removed. in a separate dataset.

Removed_duplicates

ID vin zip

13456 5433 41323

16754 6432 51678

thanks in advance.

Re: save the duplicates

PeterClemmensen — Fri, 14 Oct 2022 17:54:25 GMT

Try this

data have;
input ID vin zip;
datalines;
11221 7896 43567
11221 7654 43567
13456 5433 41323
13456 5433 41323
16754 6432 51678
16754 6432 51678
;

proc sort data = have nodupkey dupout = dups;
   by _ALL_;
run;

Result:

have

ID     vin   zip
11221  7654  43567
11221  7896  43567
13456  5433  41323
16754  6432  51678

dups

ID     vin   zip
13456  5433  41323
16754  6432  51678

Re: save the duplicates

PeterClemmensen — Fri, 14 Oct 2022 17:57:40 GMT

Alternatively, a data step approach

data have dups;

   if _N_ = 1 then do;
      dcl hash h(dataset : 'have(obs = 0)');
      h.definekey(all : 'Y');
      h.definedone();
   end;

   set have;

   if h.add() = 0 then output have;
   else                output dups;

run;

Re: save the duplicates

ballardw — Fri, 14 Oct 2022 18:04:17 GMT

Maybe this gets you started. Please note the DATA step code to provide example data that we can test code against.

This removes all records with duplicate values.

data have;
   input ID vin zip ;
datalines;
11221 7896  43567
11221 7654 43567
13456 5433  41323
13456 5433  41323
16754  6432  51678
16754 6432 51678
;

proc sort data=have out=duplicates 
     uniqueout=Havesort nouniquekey ;
  by _all_;
run;

The Havesort data set in the output is the sorted values of the Unique or "not duplicated" values. The Duplicates set will have all the records with the duplicate values, not just one.

This leaves one value of each of the duplicates in the Want set with the remainder of the duplicates in the Dupes set:

proc sort data=have out=want 
     dupout=dupes nodupkey ;
  by _all_;
run;

If this doesn't do what you want then provide a clearer example as to what the set without duplicates should look like.