There's a few ways to do this, it's best to give an example of the data you have and what you want to see. Here's one solution:
data dup nodup;
count + 1;
if first.variable then count = 1;
if count > 1 then dup = 'Y';
if dup = 'Y' then output dup;
else output nodup;
Or, to save yourself all that typing, you could do:
proc sort data=have out=uniques dupout=dups nodupkey;
by <id variables>;
And then if you want only those singles (as wasn't clear from the original post):
create table WANT as
select * from UNIQUES
where <id variables> not in (select <id vars> from DUPS);
It is easy to do with proc sort.
If you want to keep one record from duplicate records, you could use nodupkey,
out dataset is Non_dup,remaining duplicate records go to up.
If you want to keep all unique record, use nouniqueley,all records go to uniqueout, othwise go to out.
input x $ y;
proc sort data=have out=Non_dup dupout=dup nodupkey ;
proc sort data=have uniqueout=Unique out=all_dup nouniquekey;
Here are the pieces that you didn't tell us:
(a) What constitutes a duplicate? Is just one variable the same, or are all variables the same?
(b) If there are duplicates, should all of them go into the same data set? Or should the first one go into a separate data set and any additional duplicates go into a different data set?
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.