BookmarkSubscribeRSS Feed
Smitha9
Fluorite | Level 6

Hi,

I have a dataset

ID                 vin                   zip             

11221         7896            43567

11221         7654             43567

13456         5433              41323

13456         5433              41323

16754          6432              51678

16754         6432               51678

 

I want to remove the duplicates and want to save what I removed. in a separate dataset.

Removed_duplicates

ID               vin                     zip

13456        5433               41323

16754        6432                51678

 

thanks in advance.

 

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input ID vin zip;
datalines;
11221 7896 43567
11221 7654 43567
13456 5433 41323
13456 5433 41323
16754 6432 51678
16754 6432 51678
;

proc sort data = have nodupkey dupout = dups;
   by _ALL_;
run;

 

Result:

 

have

ID     vin   zip
11221  7654  43567
11221  7896  43567
13456  5433  41323
16754  6432  51678

dups

ID     vin   zip
13456  5433  41323
16754  6432  51678
PeterClemmensen
Tourmaline | Level 20

Alternatively, a data step approach

 

data have dups;

   if _N_ = 1 then do;
      dcl hash h(dataset : 'have(obs = 0)');
      h.definekey(all : 'Y');
      h.definedone();
   end;

   set have;

   if h.add() = 0 then output have;
   else                output dups;

run;
ballardw
Super User

Maybe this gets you started. Please note the DATA step code to provide example data that we can test code against.

This removes all records with duplicate values.

data have;
   input ID vin zip ;
datalines;
11221 7896  43567
11221 7654 43567
13456 5433  41323
13456 5433  41323
16754  6432  51678
16754 6432 51678
;

proc sort data=have out=duplicates 
     uniqueout=Havesort nouniquekey ;
  by _all_;
run;

The Havesort data set in the output is the sorted values of the Unique or "not duplicated" values. The Duplicates set will have all the records with the duplicate values, not just one.

This leaves one value of each of the duplicates in the Want set with the remainder of the duplicates in the Dupes set:

proc sort data=have out=want 
     dupout=dupes nodupkey ;
  by _all_;
run;

If this doesn't do what you want then provide a clearer example as to what the set without duplicates should look like.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 881 views
  • 0 likes
  • 3 in conversation