<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Matching e-mails with similar character strings in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Matching-e-mails-with-similar-character-strings/m-p/235712#M9507</link>
    <description>&lt;P&gt;Dear SAS Community&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to be able to link accounts with unique e-mails which are similar to each other, for example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Account 1. vladimr241@gmail.com&lt;/P&gt;&lt;P&gt;Accoun 2. vladimr231@gmail.com&lt;/P&gt;&lt;P&gt;Account 3. vladim1245@gmail.com&lt;/P&gt;&lt;P&gt;Account 4. vladimra3333@gmail.com&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The ultimate goal would be to create a summary table which would say that based on the example above, we are dealing withL&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- 1&amp;nbsp;account holder (1 person responsible for creating all accounts)&amp;nbsp;linked with 4 similar emails.&lt;/P&gt;&lt;P&gt;-&amp;nbsp;Or we can&amp;nbsp;summarise it as&amp;nbsp;4 accounts linked with 1 e-mail (so we are still assuming 1 person responsible for creating all accounts, but this time we are saying that four accounts were created using the same (as in almost identical) e-mail address.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I came across SAS pdf titled "Using Edit-Distance Functions to Identify “Similar” E-Mail Addresses" which discusses SPEDIS, COMPLEV, COMPEGED procedures. Unfortunately, things discussed there are quite vague and I prorably need slightly more basic tutorial, so I was wondering whether there is any standard query that would meet my requirements&amp;nbsp;(i.e. summarise it in the above-described way)&amp;nbsp;if I applied it to tens of thousands of e-mails address.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 20 Nov 2015 16:06:45 GMT</pubDate>
    <dc:creator>blazejmaksym</dc:creator>
    <dc:date>2015-11-20T16:06:45Z</dc:date>
    <item>
      <title>Matching e-mails with similar character strings</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Matching-e-mails-with-similar-character-strings/m-p/235712#M9507</link>
      <description>&lt;P&gt;Dear SAS Community&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to be able to link accounts with unique e-mails which are similar to each other, for example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Account 1. vladimr241@gmail.com&lt;/P&gt;&lt;P&gt;Accoun 2. vladimr231@gmail.com&lt;/P&gt;&lt;P&gt;Account 3. vladim1245@gmail.com&lt;/P&gt;&lt;P&gt;Account 4. vladimra3333@gmail.com&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The ultimate goal would be to create a summary table which would say that based on the example above, we are dealing withL&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- 1&amp;nbsp;account holder (1 person responsible for creating all accounts)&amp;nbsp;linked with 4 similar emails.&lt;/P&gt;&lt;P&gt;-&amp;nbsp;Or we can&amp;nbsp;summarise it as&amp;nbsp;4 accounts linked with 1 e-mail (so we are still assuming 1 person responsible for creating all accounts, but this time we are saying that four accounts were created using the same (as in almost identical) e-mail address.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I came across SAS pdf titled "Using Edit-Distance Functions to Identify “Similar” E-Mail Addresses" which discusses SPEDIS, COMPLEV, COMPEGED procedures. Unfortunately, things discussed there are quite vague and I prorably need slightly more basic tutorial, so I was wondering whether there is any standard query that would meet my requirements&amp;nbsp;(i.e. summarise it in the above-described way)&amp;nbsp;if I applied it to tens of thousands of e-mails address.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 16:06:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Matching-e-mails-with-similar-character-strings/m-p/235712#M9507</guid>
      <dc:creator>blazejmaksym</dc:creator>
      <dc:date>2015-11-20T16:06:45Z</dc:date>
    </item>
    <item>
      <title>Re: Matching e-mails with similar character strings</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Matching-e-mails-with-similar-character-strings/m-p/236810#M9508</link>
      <description>&lt;P&gt;It is a difficult question to answer , I tried Soundex function .But for ten thousands different&amp;nbsp;names you have to find a different approach&amp;nbsp;, I try to find and post the answer in any.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data newdata;&lt;BR /&gt;input Emailid$ 30.;&lt;BR /&gt;Emailid1=soundex(Emailid);&lt;BR /&gt;datalines;&lt;BR /&gt;vladimr241@gmail.com&lt;BR /&gt;vladimr231@gmail.com&lt;BR /&gt;vladim1245@gmail.com&lt;BR /&gt;vladimra100000@gmail.com&lt;BR /&gt;Hello@gmail.com&lt;BR /&gt;Val@1234567890&lt;BR /&gt;;&lt;BR /&gt;run;&lt;BR /&gt;Proc Print data = newdata;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Sat, 28 Nov 2015 14:39:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Matching-e-mails-with-similar-character-strings/m-p/236810#M9508</guid>
      <dc:creator>pearsoninst</dc:creator>
      <dc:date>2015-11-28T14:39:18Z</dc:date>
    </item>
  </channel>
</rss>

