BookmarkSubscribeRSS Feed
gbeanie15
Calcite | Level 5

I have a dataset that I am trying to remove duplicates based on two variables, for example, name and date. The dataset looks something like this

 

Name          Date
John Doe    7/11/2020
John Doe    7/15/2020
John Doe    7/15/2020
Melissa Monday  7/5/2020
Melissa Monday  7/11/2020
Melissa Monday  7/11/2020
Sue Susan  7/1/2020
Sue Susan 7/5/2020
Sue Susan 7/5/2020

I am attempting to remove duplicates, but duplicate dates within each name. I have tried doing the following:

data nodups; set data; by name date; if (first.name and first.date) then output; run;

but that only gives me observations where it is the first name and also the first date in the entire dataset, not the first date within the individual dataset. Does anyone know how I can get to this?

3 REPLIES 3
smantha
Lapis Lazuli | Level 10

it cannot be first.name and first.date as by record 2 i.e. when _n_=2 first.name is no longer correct. You only need first.date

data nodups; 
set data; by name date;
 if (first.date) then output; run;

Else you could try noduprec option or nodupkey option when you use procsort

andreas_lds
Jade | Level 19

What do you expect as result?

Have you tried proc sort?

proc sort data=have out=nodups nodupkey;
   by Name Date;
run;
Kurt_Bremser
Super User

The variables in a BY statement form a hierarchy from left to right, with left being the most significant level.

This means that, when this is used:

by name date;

first.name = true (1) automatically implies first.date = true, but first.date can happen without first.name being true.

 

Which all means that you only have to use first.date as an indicator:

data nodups;
set data;
by name date;
if first.date;
run;

and, since all you want to do is restrict which observations reach the end of a data step iteration, you can use the Subsetting IF 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1804 views
  • 1 like
  • 4 in conversation