DATA Step, Macro, Functions and more

remove duplicates

Posts: 29

remove duplicates



I have a dataset


ID   age     var1

1     30       1

1     30       2

1     32       1

1     33       3

How do I remove duplicate ID and age ? It doesn't matter if I keep the first or last of duplicate

So the I want my table to look like


ID age var1

1   30    1

1   32    1

1   33    3



Valued Guide
Posts: 519

Re: remove duplicates

Look up the documentation of proc sort, especially the nodule option. Using Id and age as by-vars should solve the problem.
Trusted Advisor
Posts: 1,831

Re: remove duplicates

Posted in reply to andreas_lds

You can remove duplicats, either by

proc sort data=have out=want NODUPKEY;
  by id age;

or by SQL:

proc sql;
    create table want as
    select * from have
    group by id, age
; quit;

If you compare the rwo methods, you'll find that -

the one keeps the 1st occurence while the other keeps the last occurence.

As much as I remeber, SQL keeps the 1st occurence.


Super User
Posts: 6,632

Re: remove duplicates

[ Edited ]

Given that your data set is already in sorted order by ID AGE, you could try:


data want;

set have;

by id age;

if first.age;



It's worth spending time on the BY statement in a DATA step and how it creates FIRST.AGE (and a few more variables).  Those will be tools you use over and over again.

Occasional Contributor
Posts: 14

Re: remove duplicates

proc sort data=have; 

by ID age var1;



data want;

set have;

by ID age var1;

if first.var1;


Ask a Question
Discussion stats
  • 4 replies
  • 5 in conversation