I have a dataset that I am trying to remove duplicates based on two variables, for example, name and date. The dataset looks something like this
Name          Date
John Doe    7/11/2020
John Doe    7/15/2020
John Doe    7/15/2020
Melissa Monday  7/5/2020
Melissa Monday  7/11/2020
Melissa Monday  7/11/2020
Sue Susan  7/1/2020
Sue Susan 7/5/2020
Sue Susan 7/5/2020I am attempting to remove duplicates, but duplicate dates within each name. I have tried doing the following:
data nodups; set data; by name date; if (first.name and first.date) then output; run;but that only gives me observations where it is the first name and also the first date in the entire dataset, not the first date within the individual dataset. Does anyone know how I can get to this?
it cannot be first.name and first.date as by record 2 i.e. when _n_=2 first.name is no longer correct. You only need first.date
data nodups; 
set data; by name date;
 if (first.date) then output; run;Else you could try noduprec option or nodupkey option when you use procsort
What do you expect as result?
Have you tried proc sort?
proc sort data=have out=nodups nodupkey;
   by Name Date;
run;The variables in a BY statement form a hierarchy from left to right, with left being the most significant level.
This means that, when this is used:
by name date;first.name = true (1) automatically implies first.date = true, but first.date can happen without first.name being true.
Which all means that you only have to use first.date as an indicator:
data nodups;
set data;
by name date;
if first.date;
run;and, since all you want to do is restrict which observations reach the end of a data step iteration, you can use the Subsetting IF
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
