Desktop productivity for business analysts and programmers

identify duplicates

Reply
N/A
Posts: 0

identify duplicates

I'm working on a large clinical dataset. I'd like to extract duplicates into a new table. Any idea on how to do this?
Frequent Contributor
Posts: 81

Re: identify duplicates

Not really an EG issue, but you could put this into a Code node:

PROC SORT DATA = inputdsn OUT = temp;
BY var1 var2 var3;
RUN;

DATA unique duplicates;
SET temp;
BY var1 var2 var3;
IF NOT LAST.var3 THEN OUTPUT duplicates;
ELSE OUTPUT unique;
RUN;

"var1 var2 var3" are the variables used to identify the duplicated records. Your duplicate values will be in the "duplicates" data set. The individual unique records will be in the "unique" data set.

Is this what you were looking for?

.............Phil

Message was edited by: prholland Message was edited by: prholland
N/A
Posts: 0

Re: identify duplicates

Its just what I'm looking for although I was hoping there would be a feature in enterprise guide that would do it...
New Contributor
Posts: 3

Re: identify duplicates

If your client is version 9 then you can use DUPOUT

data in;
do x=1 to 6; output; end;
do x=1 to 2; output; end;
run;
proc sort data=in out=out nodupkey dupout=dupes;
by x;
run;

Colin
Ask a Question
Discussion stats
  • 3 replies
  • 152 views
  • 0 likes
  • 3 in conversation