Solved: Re: How to identify duplicates for data with IDs as a group of variabl...

NonSleeper · Posted 06-04-2015 02:31 AM

My data has IDs as a group of variables, let's say 3. It means each variable may not be unique individually, but together every 3 variables specifies a unique observation. It looks like this:

Var1 Var2 Var3

John PHIL PA

Mike PHIL PA

John CHIC IL

John PHIL PA

You will see that observations 1 and 4 are duplicates and there comes the question: How can I identify duplicates from this data?

If it's some single ID variable I can do:

data dups nodups;

set have;

by ID;

if first.ID and last.ID then output nodups;

else output dups;

run;

But I'm not sure how to do in this situation. I can sort them out one by one by wonder if there's a more efficient way.

AskoLötjönen · Posted 06-04-2015 02:38 AM

Just sort by all three vars:

proc sort data=have;

by var1 var2 var3;

run;

data dups nodups;

set have;

by var1 var2 var3;

if first.var3 and last.var3 then output nodups;

else output dups;

run;

View solution in original post

AskoLötjönen · Posted 06-04-2015 02:38 AM

Just sort by all three vars:

proc sort data=have;

by var1 var2 var3;

run;

data dups nodups;

set have;

by var1 var2 var3;

if first.var3 and last.var3 then output nodups;

else output dups;

run;

NonSleeper · Posted 06-04-2015 02:42 AM

Oh...Wao...Yeh...

I think I'm gonna go home. Well, no : )

How to identify duplicates for data with IDs as a group of variables

Re: How to identify duplicates for data with IDs as a group of variables

Re: How to identify duplicates for data with IDs as a group of variables

Re: How to identify duplicates for data with IDs as a group of variables