How to identify duplicates for data with IDs as a group of variables

Solved
Frequent Contributor
Posts: 75

How to identify duplicates for data with IDs as a group of variables

My data has IDs as a group of variables, let's say 3. It means each variable may not be unique individually, but together every 3 variables specifies a unique observation. It looks like this:

Var1     Var2     Var3

John     PHIL    PA

Mike     PHIL    PA

John     CHIC    IL

John     PHIL     PA

You will see that observations 1 and 4 are duplicates and there comes the question: How can I identify duplicates from this data?

If it's some single ID variable I can do:

data dups nodups;

set have;

by ID;

if first.ID and last.ID then output nodups;

else output dups;

run;

But I'm not sure how to do in this situation. I can sort them out one by one by wonder if there's a more efficient way.

Accepted Solutions
Solution
‎06-04-2015 02:38 AM
Contributor
Posts: 44

Re: How to identify duplicates for data with IDs as a group of variables

Posted in reply to NonSleeper

Just sort by all three vars:

proc sort data=have;

by var1 var2 var3;

run;

data dups nodups;

set have;

by var1 var2 var3;

if first.var3 and last.var3 then output nodups;

else output dups;

run;

All Replies
Solution
‎06-04-2015 02:38 AM
Contributor
Posts: 44

Re: How to identify duplicates for data with IDs as a group of variables

Posted in reply to NonSleeper

Just sort by all three vars:

proc sort data=have;

by var1 var2 var3;

run;

data dups nodups;

set have;

by var1 var2 var3;

if first.var3 and last.var3 then output nodups;

else output dups;

run;

Frequent Contributor
Posts: 75

Re: How to identify duplicates for data with IDs as a group of variables

Oh...Wao...Yeh...

I think I'm gonna go home. Well, no : )

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 2 replies
• 651 views
• 0 likes
• 2 in conversation