Hi,
I have a dataset
ID vin zip
11221 7896 43567
11221 7654 43567
13456 5433 41323
13456 5433 41323
16754 6432 51678
16754 6432 51678
I want to remove the duplicates and want to save what I removed. in a separate dataset.
Removed_duplicates
ID vin zip
13456 5433 41323
16754 6432 51678
thanks in advance.
Try this
data have;
input ID vin zip;
datalines;
11221 7896 43567
11221 7654 43567
13456 5433 41323
13456 5433 41323
16754 6432 51678
16754 6432 51678
;
proc sort data = have nodupkey dupout = dups;
by _ALL_;
run;
Result:
have ID vin zip 11221 7654 43567 11221 7896 43567 13456 5433 41323 16754 6432 51678 dups ID vin zip 13456 5433 41323 16754 6432 51678
Alternatively, a data step approach
data have dups;
if _N_ = 1 then do;
dcl hash h(dataset : 'have(obs = 0)');
h.definekey(all : 'Y');
h.definedone();
end;
set have;
if h.add() = 0 then output have;
else output dups;
run;
Maybe this gets you started. Please note the DATA step code to provide example data that we can test code against.
This removes all records with duplicate values.
data have; input ID vin zip ; datalines; 11221 7896 43567 11221 7654 43567 13456 5433 41323 13456 5433 41323 16754 6432 51678 16754 6432 51678 ; proc sort data=have out=duplicates uniqueout=Havesort nouniquekey ; by _all_; run;
The Havesort data set in the output is the sorted values of the Unique or "not duplicated" values. The Duplicates set will have all the records with the duplicate values, not just one.
This leaves one value of each of the duplicates in the Want set with the remainder of the duplicates in the Dupes set:
proc sort data=have out=want dupout=dupes nodupkey ; by _all_; run;
If this doesn't do what you want then provide a clearer example as to what the set without duplicates should look like.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.