BookmarkSubscribeRSS Feed
blazejmaksym
Calcite | Level 5

Dear SAS Community

 

I would like to be able to link accounts with unique e-mails which are similar to each other, for example:

 

Account 1. vladimr241@gmail.com

Accoun 2. vladimr231@gmail.com

Account 3. vladim1245@gmail.com

Account 4. vladimra3333@gmail.com

 

The ultimate goal would be to create a summary table which would say that based on the example above, we are dealing withL

 

- 1 account holder (1 person responsible for creating all accounts) linked with 4 similar emails.

- Or we can summarise it as 4 accounts linked with 1 e-mail (so we are still assuming 1 person responsible for creating all accounts, but this time we are saying that four accounts were created using the same (as in almost identical) e-mail address.

 

I came across SAS pdf titled "Using Edit-Distance Functions to Identify “Similar” E-Mail Addresses" which discusses SPEDIS, COMPLEV, COMPEGED procedures. Unfortunately, things discussed there are quite vague and I prorably need slightly more basic tutorial, so I was wondering whether there is any standard query that would meet my requirements (i.e. summarise it in the above-described way) if I applied it to tens of thousands of e-mails address.

 

 

1 REPLY 1
pearsoninst
Pyrite | Level 9

It is a difficult question to answer , I tried Soundex function .But for ten thousands different names you have to find a different approach , I try to find and post the answer in any.

 

 

data newdata;
input Emailid$ 30.;
Emailid1=soundex(Emailid);
datalines;
vladimr241@gmail.com
vladimr231@gmail.com
vladim1245@gmail.com
vladimra100000@gmail.com
Hello@gmail.com
Val@1234567890
;
run;
Proc Print data = newdata;
run;

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1701 views
  • 0 likes
  • 2 in conversation