Re: remove duplicates

leahcho · Posted 11-04-2017 09:54 PM

Hi,

I have a dataset

ID age var1

1 30 1

1 30 2

1 32 1

1 33 3

How do I remove duplicate ID and age ? It doesn't matter if I keep the first or last of duplicate

So the I want my table to look like

ID age var1

1 30 1

1 32 1

1 33 3

Thanks

andreas_lds · Posted 11-04-2017 10:39 PM

Look up the documentation of proc sort, especially the nodule option. Using Id and age as by-vars should solve the problem.

Shmuel · Posted 11-04-2017 10:59 PM

You can remove duplicats, either by

proc sort data=have out=want NODUPKEY;
  by id age;
run;

or by SQL:

proc sql;
    create table want as
    select * from have
    group by id, age
; quit;

If you compare the rwo methods, you'll find that -

the one keeps the 1st occurence while the other keeps the last occurence.

As much as I remeber, SQL keeps the 1st occurence.

Astounding · Posted 11-04-2017 10:58 PM

Given that your data set is already in sorted order by ID AGE, you could try:

data want;

set have;

by id age;

if first.age;

run;

It's worth spending time on the BY statement in a DATA step and how it creates FIRST.AGE (and a few more variables). Those will be tools you use over and over again.

DavyJones · Posted 11-05-2017 08:48 PM

proc sort data=have;

by ID age var1;

run;

data want;

set have;

by ID age var1;

if first.var1;

run;

remove duplicates