BookmarkSubscribeRSS Feed
blazejmaksym
Calcite | Level 5

Dear SAS Community

 

I would like to be able to link accounts with unique e-mails which are similar to each other, for example:

 

Account 1. vladimr241@gmail.com

Accoun 2. vladimr231@gmail.com

Account 3. vladim1245@gmail.com

Account 4. vladimra3333@gmail.com

 

The ultimate goal would be to create a summary table which would say that based on the example above, we are dealing withL

 

- 1 account holder (1 person responsible for creating all accounts) linked with 4 similar emails.

- Or we can summarise it as 4 accounts linked with 1 e-mail (so we are still assuming 1 person responsible for creating all accounts, but this time we are saying that four accounts were created using the same (as in almost identical) e-mail address.

 

I came across SAS pdf titled "Using Edit-Distance Functions to Identify “Similar” E-Mail Addresses" which discusses SPEDIS, COMPLEV, COMPEGED procedures. Unfortunately, things discussed there are quite vague and I prorably need slightly more basic tutorial, so I was wondering whether there is any standard query that would meet my requirements (i.e. summarise it in the above-described way) if I applied it to tens of thousands of e-mails address.

 

 

1 REPLY 1
pearsoninst
Pyrite | Level 9

It is a difficult question to answer , I tried Soundex function .But for ten thousands different names you have to find a different approach , I try to find and post the answer in any.

 

 

data newdata;
input Emailid$ 30.;
Emailid1=soundex(Emailid);
datalines;
vladimr241@gmail.com
vladimr231@gmail.com
vladim1245@gmail.com
vladimra100000@gmail.com
Hello@gmail.com
Val@1234567890
;
run;
Proc Print data = newdata;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1094 views
  • 0 likes
  • 2 in conversation