Finding Duplicates

Reply
New Contributor
Posts: 2

Finding Duplicates

I am trying to find duplicates in one column that have different values in another.  For example, different sender names with the same sender address.  I am using EG and I am not sure which option would be the best.

For example:

Sender Name                   Sender Address

John Doe                              123 Main St.

Jane Doe                              123 Main St.

Thanks for the help!

Super User
Super User
Posts: 7,401

Re: Finding Duplicates

Can you use proc freq on the data?

Respected Advisor
Posts: 3,124

Re: Finding Duplicates

if you don't mind putting down some SAS code, and assuming name and address are all you have, then try the following:

proc sql;

  create table want/*this is your output table*/ as

    select * from have /*this is the input table*/

     group by sender_address

       having count(*) >1

;quit;

Or like suggested, try using proc freq. In EG, you will find it under the task name called  "One-Way Frequencies".

Haikuo

New Contributor
Posts: 2

Re: Finding Duplicates

There is a lot of data involved....  I have 49 different columns.  Proc freq will show me how many times an address is used but I want to be able to output the instance where different Senders are using the same address

Respected Advisor
Posts: 3,124

Re: Finding Duplicates

You could try using query builder , and/or one-way frequencies, or just run the code that I posted with a minor weak:

proc sql;

  create table want/*this is your output table*/ as

  select * from have /*this is the input table*/

  group by sender_address

  having count(distinct sender_name) >1

;quit;

This will give you all the addresses that have more than one different names.

Haikuo

Update: not that I am not aware you are EG user, but to post numerous screen shots is just too much for me.

Contributor
Posts: 60

Re: Finding Duplicates

Hi,

If you have Data Flux Tool you can do it very easily. you need to run the DF job with proper QKB attached with it.

It will generate same ID for duplicate records.

Let me know if you want if you want more elaboration.

Thanks

Pravin

Ask a Question
Discussion stats
  • 5 replies
  • 260 views
  • 0 likes
  • 4 in conversation