BookmarkSubscribeRSS Feed
Smitha9
Fluorite | Level 6

Hi,

I have a dataset

ID                 vin                   zip             

11221         7896            43567

11221         7654             43567

13456         5433              41323

13456         5433              41323

16754          6432              51678

16754         6432               51678

 

I want to remove the duplicates and want to save what I removed. in a separate dataset.

Removed_duplicates

ID               vin                     zip

13456        5433               41323

16754        6432                51678

 

thanks in advance.

 

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input ID vin zip;
datalines;
11221 7896 43567
11221 7654 43567
13456 5433 41323
13456 5433 41323
16754 6432 51678
16754 6432 51678
;

proc sort data = have nodupkey dupout = dups;
   by _ALL_;
run;

 

Result:

 

have

ID     vin   zip
11221  7654  43567
11221  7896  43567
13456  5433  41323
16754  6432  51678

dups

ID     vin   zip
13456  5433  41323
16754  6432  51678
PeterClemmensen
Tourmaline | Level 20

Alternatively, a data step approach

 

data have dups;

   if _N_ = 1 then do;
      dcl hash h(dataset : 'have(obs = 0)');
      h.definekey(all : 'Y');
      h.definedone();
   end;

   set have;

   if h.add() = 0 then output have;
   else                output dups;

run;
ballardw
Super User

Maybe this gets you started. Please note the DATA step code to provide example data that we can test code against.

This removes all records with duplicate values.

data have;
   input ID vin zip ;
datalines;
11221 7896  43567
11221 7654 43567
13456 5433  41323
13456 5433  41323
16754  6432  51678
16754 6432 51678
;

proc sort data=have out=duplicates 
     uniqueout=Havesort nouniquekey ;
  by _all_;
run;

The Havesort data set in the output is the sorted values of the Unique or "not duplicated" values. The Duplicates set will have all the records with the duplicate values, not just one.

This leaves one value of each of the duplicates in the Want set with the remainder of the duplicates in the Dupes set:

proc sort data=have out=want 
     dupout=dupes nodupkey ;
  by _all_;
run;

If this doesn't do what you want then provide a clearer example as to what the set without duplicates should look like.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 255 views
  • 0 likes
  • 3 in conversation