BookmarkSubscribeRSS Feed
leahcho
Obsidian | Level 7

Hi,

 

I have a dataset

 

ID   age     var1

1     30       1

1     30       2

1     32       1

1     33       3

How do I remove duplicate ID and age ? It doesn't matter if I keep the first or last of duplicate

So the I want my table to look like

 

ID age var1

1   30    1

1   32    1

1   33    3

 

Thanks

4 REPLIES 4
andreas_lds
Jade | Level 19
Look up the documentation of proc sort, especially the nodule option. Using Id and age as by-vars should solve the problem.
Shmuel
Garnet | Level 18

You can remove duplicats, either by

proc sort data=have out=want NODUPKEY;
  by id age;
run;

or by SQL:

proc sql;
    create table want as
    select * from have
    group by id, age
; quit;

If you compare the rwo methods, you'll find that -

the one keeps the 1st occurence while the other keeps the last occurence.

As much as I remeber, SQL keeps the 1st occurence.

 

Astounding
PROC Star

Given that your data set is already in sorted order by ID AGE, you could try:

 

data want;

set have;

by id age;

if first.age;

run;

 

It's worth spending time on the BY statement in a DATA step and how it creates FIRST.AGE (and a few more variables).  Those will be tools you use over and over again.

DavyJones
Obsidian | Level 7

proc sort data=have; 

by ID age var1;

run;

 

data want;

set have;

by ID age var1;

if first.var1;

run;

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 9084 views
  • 0 likes
  • 5 in conversation