topic Re: remove duplicates in SAS Programming

remove duplicates

leahcho — Sun, 05 Nov 2017 01:54:50 GMT

Hi,

I have a dataset

ID age var1

1 30 1

1 30 2

1 32 1

1 33 3

How do I remove duplicate ID and age ? It doesn't matter if I keep the first or last of duplicate

So the I want my table to look like

ID age var1

1 30 1

1 32 1

1 33 3

Thanks

Re: remove duplicates

andreas_lds — Sun, 05 Nov 2017 02:39:35 GMT

Look up the documentation of proc sort, especially the nodule option. Using Id and age as by-vars should solve the problem.

Re: remove duplicates

Astounding — Sun, 05 Nov 2017 03:02:16 GMT

Given that your data set is already in sorted order by ID AGE, you could try:

data want;

set have;

by id age;

if first.age;

run;

It's worth spending time on the BY statement in a DATA step and how it creates FIRST.AGE (and a few more variables). Those will be tools you use over and over again.

Re: remove duplicates

Shmuel — Sun, 05 Nov 2017 02:59:48 GMT

You can remove duplicats, either by

proc sort data=have out=want NODUPKEY;
  by id age;
run;

or by SQL:

proc sql;
    create table want as
    select * from have
    group by id, age
; quit;

If you compare the rwo methods, you'll find that -

the one keeps the 1st occurence while the other keeps the last occurence.

As much as I remeber, SQL keeps the 1st occurence.

Re: remove duplicates

DavyJones — Mon, 06 Nov 2017 01:48:38 GMT

proc sort data=have;

by ID age var1;

run;

data want;

set have;

by ID age var1;

if first.var1;

run;