save the duplicates

Smitha9 · Posted 10-14-2022 01:40 PM

Hi,

I have a dataset

ID vin zip

11221 7896 43567

11221 7654 43567

13456 5433 41323

16754 6432 51678

I want to remove the duplicates and want to save what I removed. in a separate dataset.

Removed_duplicates

ID vin zip

13456 5433 41323

16754 6432 51678

thanks in advance.

PeterClemmensen · Posted 10-14-2022 01:52 PM

Try this

data have;
input ID vin zip;
datalines;
11221 7896 43567
11221 7654 43567
13456 5433 41323
13456 5433 41323
16754 6432 51678
16754 6432 51678
;

proc sort data = have nodupkey dupout = dups;
   by _ALL_;
run;

Result:

have

ID     vin   zip
11221  7654  43567
11221  7896  43567
13456  5433  41323
16754  6432  51678

dups

ID     vin   zip
13456  5433  41323
16754  6432  51678

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 10-14-2022 01:57 PM

Alternatively, a data step approach

data have dups;

   if _N_ = 1 then do;
      dcl hash h(dataset : 'have(obs = 0)');
      h.definekey(all : 'Y');
      h.definedone();
   end;

   set have;

   if h.add() = 0 then output have;
   else                output dups;

run;

The DATA to DATA Step Macro
Blog: SASnrd

ballardw · Posted 10-14-2022 02:04 PM

Maybe this gets you started. Please note the DATA step code to provide example data that we can test code against.

This removes all records with duplicate values.

data have;
   input ID vin zip ;
datalines;
11221 7896  43567
11221 7654 43567
13456 5433  41323
13456 5433  41323
16754  6432  51678
16754 6432 51678
;

proc sort data=have out=duplicates 
     uniqueout=Havesort nouniquekey ;
  by _all_;
run;

The Havesort data set in the output is the sorted values of the Unique or "not duplicated" values. The Duplicates set will have all the records with the duplicate values, not just one.

This leaves one value of each of the duplicates in the Want set with the remainder of the duplicates in the Dupes set:

proc sort data=have out=want 
     dupout=dupes nodupkey ;
  by _all_;
run;

If this doesn't do what you want then provide a clearer example as to what the set without duplicates should look like.

save the duplicates

Re: save the duplicates

Re: save the duplicates

Re: save the duplicates

save the duplicates

Re: save the duplicates

Re: save the duplicates

Re: save the duplicates

Click image to register for webinar

Classroom Training Available!