Desktop productivity for business analysts and programmers

identify duplicates

Reply
N/A
Posts: 0

identify duplicates

I'm working on a large clinical dataset. I'd like to extract duplicates into a new table. Any idea on how to do this?
Frequent Contributor
Posts: 81

Re: identify duplicates

Posted in reply to deleted_user
Not really an EG issue, but you could put this into a Code node:

PROC SORT DATA = inputdsn OUT = temp;
BY var1 var2 var3;
RUN;

DATA unique duplicates;
SET temp;
BY var1 var2 var3;
IF NOT LAST.var3 THEN OUTPUT duplicates;
ELSE OUTPUT unique;
RUN;

"var1 var2 var3" are the variables used to identify the duplicated records. Your duplicate values will be in the "duplicates" data set. The individual unique records will be in the "unique" data set.

Is this what you were looking for?

.............Phil

Message was edited by: prholland Message was edited by: prholland
N/A
Posts: 0

Re: identify duplicates

Posted in reply to deleted_user
Its just what I'm looking for although I was hoping there would be a feature in enterprise guide that would do it...
New Contributor
Posts: 3

Re: identify duplicates

Posted in reply to deleted_user
If your client is version 9 then you can use DUPOUT

data in;
do x=1 to 6; output; end;
do x=1 to 2; output; end;
run;
proc sort data=in out=out nodupkey dupout=dupes;
by x;
run;

Colin
Ask a Question
Discussion stats
  • 3 replies
  • 196 views
  • 0 likes
  • 3 in conversation