DATA Step, Macro, Functions and more

Miss Match data from same dataset (get duplicates)

Reply
Contributor
Posts: 46

Miss Match data from same dataset (get duplicates)


Hi I am Sorry for small correction in data.

looking the duplicate data within the same dataset by the different fields

Ex:-

data have;

input name $5.+1 phone $5.;

cards;

AAAAA 12345

AAAAC 12345

AAAAA 09876

BBBBB 12345

CCCCC 99999

DDDDE 88888

DDDDD 88888

DDDDD 55555

EEEEE 44444

FFFFF  11111

;

RUN;

Am looking duplicate phone numbers by Name.(Same name having different Phone numbers)

Ex:-

Name Phone

AAAAA   12345

AAAAA   09876

DDDDE  88888

DDDDD  55555


I think we can use Soundex function to get exact duplicate values by proc sql;

plz any one help me with syntax.

Would Appreciate in Advance.

Contributor
Posts: 29

Re: Miss Match data from same dataset (get duplicates)

Hope this solves your issue,

proc sql;

create table want as select *,count(name) as count from have group by name having count > 1;

quit;

Sudeer

Super User
Super User
Posts: 7,992

Re: Miss Match data from same dataset (get duplicates)

Hi,

data have;
attrib Name Phone format=$200.;
name="AAA"; phone="12345"; output;
name="AAA"; phone="67890"; output;
name="BBB"; phone="fssfs"; output;
run;

proc sql;
     create table WANT as
     select     *
     from       HAVE
     group by NAME
     having    count(name)>1;
quit;

Contributor
Posts: 46

Re: Miss Match data from same dataset (get duplicates)

Thnank you for your valuable information

I Think we can use SOUNDEX Function to get exact Duplicates values.

proc sql;

create table dups as select distinct a.*, b.*

from have as a,have as b

where soundex(a.name) eq soundex(b.name) and a.phone ne b.phone

order by a.name

;

quit;

Super Contributor
Posts: 340

Re: Miss Match data from same dataset (get duplicates)

less elegant, but if you don´t care to sort and merge:


Proc Sort Data=have;
  By name;
Run;

Proc Freq Data=Have NoPrint;
  Tables name / Out=Have_F (Keep=Count name);
Run;

Data Want (Where=(Count ge 2));
  Merge Have (in=inH) Have_F (in=inF);
  By Name;
  If inH;
Run;

Contributor
Posts: 29

Re: Miss Match data from same dataset (get duplicates)

proc sql;

create table want as select * from have group by name having count > 1;

quit;

proc sort data=want out=dup nodupkey;

by name phone;

run;

Sudeer

Super User
Posts: 10,044

Re: Miss Match data from same dataset (get duplicates)

Still not full sure what you try to get.

data have;
input name $5.+1 phone $5.;
cards;
AAAAA 12345
AAAAC 12345
AAAAA 09876
BBBBB 12345
CCCCC 99999
DDDDE 88888
DDDDD 88888
DDDDD 55555
EEEEE 44444
FFFFF  11111
;
RUN;
proc sql;
select * from have group by name having count(distinct phone) ge 2;
quit;

Xia Keshan

Ask a Question
Discussion stats
  • 6 replies
  • 328 views
  • 6 likes
  • 5 in conversation