Re: identify duplicates

deleted_user · Posted 08-01-2007 08:13 AM

I'm working on a large clinical dataset. I'd like to extract duplicates into a new table. Any idea on how to do this?

prholland · Posted 08-01-2007 08:47 AM

Not really an EG issue, but you could put this into a Code node:

PROC SORT DATA = inputdsn OUT = temp;
BY var1 var2 var3;
RUN;

DATA unique duplicates;
SET temp;
BY var1 var2 var3;
IF NOT LAST.var3 THEN OUTPUT duplicates;
ELSE OUTPUT unique;
RUN;

"var1 var2 var3" are the variables used to identify the duplicated records. Your duplicate values will be in the "duplicates" data set. The individual unique records will be in the "unique" data set.

Is this what you were looking for?

.............Phil

Message was edited by: prholland Message was edited by: prholland

deleted_user · Posted 08-01-2007 09:12 AM

Its just what I'm looking for although I was hoping there would be a feature in enterprise guide that would do it...

Colin · Posted 08-01-2007 11:37 AM

If your client is version 9 then you can use DUPOUT

data in;
do x=1 to 6; output; end;
do x=1 to 2; output; end;
run;
proc sort data=in out=out nodupkey dupout=dupes;
by x;
run;

Colin

identify duplicates