topic Re: Is there any way to delete duplicate records ina dataset in SAS Programming

Is there any way to delete duplicate records ina dataset

deleted_user — Mon, 04 May 2009 18:58:06 GMT

Hi,

Is there any way to delete exact duplicate records and write out only one recor from dulplicated set?

Say for example my infile has a set of exact 5 duplicate records and I want to delete other 4 and just write out 1 record.

thanks,
sasbase9

Re: Is there any way to delete duplicate records ina dataset

sbb — Mon, 04 May 2009 19:46:32 GMT

Explore PROC SORT and DUPOUT= option.

Scott Barry
SBBWorks, Inc.

Re: Is there any way to delete duplicate records ina dataset

barheat — Tue, 05 May 2009 02:52:34 GMT

Proc Sort with noduplicates option works.

proc sort data=dsname out=sorted noduplicates;
by var1 var2 ...;
run;

The noduplicates option removes records that are exactly the same in every variable.
The noidupkey option removes records where the by variables are the same.

Hope this helps.

Re: Is there any way to delete duplicate records ina dataset

deleted_user — Tue, 12 May 2009 09:40:32 GMT

proc sort data=x nodups dupsout=dup;
by id;
run;

Now the duplicate obs move to Dup dataset and x has the master

Re: Is there any way to delete duplicate records ina dataset

data_null__ — Tue, 12 May 2009 11:26:01 GMT

Are you sure?

What do you expect the output of this program to be?

[pre]
data have;
input a b c;
cards;
1 2 3
1 1 3
1 2 3
;;;;
run;
proc sort data=have nodup out=nodups;
by a;
run;
[/pre]

From the online doc.
[pre]
If you specify this option, then PROC SORT compares all variable values for each
observation to those for the previous observation that was written to the output data set.
If an exact match is found, then the observation is not written to the output data set.[/pre]

It goes on to say using BY _ALL_ will result in the expected output...

Re: Is there any way to delete duplicate records ina dataset

deleted_user — Thu, 11 Jun 2009 20:55:48 GMT

Hi,
You can use the following code, if you are not deleting on the basis of any key:

proc sql noprint;
create table Temp2
as
(select * from Temp1
union select * from Temp1);
quit;
run;