BookmarkSubscribeRSS Feed
gbeanie15
Calcite | Level 5

I have a dataset that I am trying to remove duplicates based on two variables, for example, name and date. The dataset looks something like this

 

Name          Date
John Doe    7/11/2020
John Doe    7/15/2020
John Doe    7/15/2020
Melissa Monday  7/5/2020
Melissa Monday  7/11/2020
Melissa Monday  7/11/2020
Sue Susan  7/1/2020
Sue Susan 7/5/2020
Sue Susan 7/5/2020

I am attempting to remove duplicates, but duplicate dates within each name. I have tried doing the following:

data nodups; set data; by name date; if (first.name and first.date) then output; run;

but that only gives me observations where it is the first name and also the first date in the entire dataset, not the first date within the individual dataset. Does anyone know how I can get to this?

3 REPLIES 3
smantha
Lapis Lazuli | Level 10

it cannot be first.name and first.date as by record 2 i.e. when _n_=2 first.name is no longer correct. You only need first.date

data nodups; 
set data; by name date;
 if (first.date) then output; run;

Else you could try noduprec option or nodupkey option when you use procsort

andreas_lds
Jade | Level 19

What do you expect as result?

Have you tried proc sort?

proc sort data=have out=nodups nodupkey;
   by Name Date;
run;
Kurt_Bremser
Super User

The variables in a BY statement form a hierarchy from left to right, with left being the most significant level.

This means that, when this is used:

by name date;

first.name = true (1) automatically implies first.date = true, but first.date can happen without first.name being true.

 

Which all means that you only have to use first.date as an indicator:

data nodups;
set data;
by name date;
if first.date;
run;

and, since all you want to do is restrict which observations reach the end of a data step iteration, you can use the Subsetting IF 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2666 views
  • 1 like
  • 4 in conversation