<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Compged - Email similarities. in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858464#M339182</link>
    <description>&lt;P&gt;Basic strategy: Compare all raw email addresses&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  select a.EMAIL, b.EMAIL, compged(a.EMAIL, b.EMAIL) as SCORE
  from HAVE a, HAVE b
  where a.EMAIL ne b.EMAIL;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This can produce massive volumes,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Smarter: Add some improvements as needed, depending on the data&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  select a.EMAIL, b.EMAIL, compged(lowcase(a.EMAIL), lowcase(b.EMAIL)) as SCORE
  from HAVE a, HAVE b
  where a.EMAIL ne b.EMAIL
    and lowcase(first(a.EMAIL))=lowcase(first(b.EMAIL));
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Here, we ensure the case is the same, and we reduce the size of the join by using an additional relevant criteria, such as the first letter being the same.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 13 Feb 2023 02:05:48 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2023-02-13T02:05:48Z</dc:date>
    <item>
      <title>Compged - Email similarities.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858455#M339179</link>
      <description>&lt;P&gt;Hi all.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Newish user to SAS here.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I'm hoping I could please get some guidance with compged function or a function that will return what I am after.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I have dataset where I want to do a vertically look of emails that are similar to one another.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;e.g.&lt;BR /&gt;Data:&lt;BR /&gt;&lt;A href="mailto:John.Doe1@hotmail.com" target="_blank"&gt;John.Doe1@hotmail.com&lt;/A&gt;&lt;BR /&gt;&lt;A href="mailto:JohnDoe1@hotmail.com" target="_blank"&gt;johndoe1@hotmail.com&lt;/A&gt;&lt;BR /&gt;&lt;A href="mailto:Johndoe123@hotmail.com" target="_blank"&gt;JohnDoe123@hotmail.com&lt;/A&gt;&lt;BR /&gt;&lt;A href="mailto:Mary_Ann1@hotmail.com" target="_blank"&gt;Mary_Ann1234@hotmail.com&lt;/A&gt;&lt;BR /&gt;&lt;A href="mailto:Mkj.Luke@hotmail.com" target="_blank"&gt;Mkj.Luke@hotmail.com&lt;/A&gt;&lt;BR /&gt;&lt;A href="mailto:Johndoe@yahoo.com" target="_blank"&gt;Johndoe@yahoo.com&lt;/A&gt;&lt;BR /&gt;&lt;A href="mailto:Ann_Jane123@gmail.com" target="_blank"&gt;Ann_Jane123@gmail.com&lt;/A&gt;&lt;BR /&gt;&lt;A href="mailto:Luked123@outlook.com" target="_blank"&gt;Luked123@outlook.com&lt;/A&gt;&lt;BR /&gt;&lt;A href="mailto:lucky_star456@yahoo.com" target="_blank"&gt;lucky_star456@yahoo.com&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;How would I approach this to return a score of emails that 'similar'?&amp;nbsp; Happy to build upon possible solutions!&lt;BR /&gt;&lt;BR /&gt;Thank you.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;S&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Feb 2023 23:57:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858455#M339179</guid>
      <dc:creator>Sdixon1</dc:creator>
      <dc:date>2023-02-12T23:57:04Z</dc:date>
    </item>
    <item>
      <title>Re: Compged - Email similarities.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858464#M339182</link>
      <description>&lt;P&gt;Basic strategy: Compare all raw email addresses&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  select a.EMAIL, b.EMAIL, compged(a.EMAIL, b.EMAIL) as SCORE
  from HAVE a, HAVE b
  where a.EMAIL ne b.EMAIL;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This can produce massive volumes,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Smarter: Add some improvements as needed, depending on the data&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  select a.EMAIL, b.EMAIL, compged(lowcase(a.EMAIL), lowcase(b.EMAIL)) as SCORE
  from HAVE a, HAVE b
  where a.EMAIL ne b.EMAIL
    and lowcase(first(a.EMAIL))=lowcase(first(b.EMAIL));
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Here, we ensure the case is the same, and we reduce the size of the join by using an additional relevant criteria, such as the first letter being the same.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Feb 2023 02:05:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858464#M339182</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2023-02-13T02:05:48Z</dc:date>
    </item>
    <item>
      <title>Re: Compged - Email similarities.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858465#M339183</link>
      <description>&lt;P&gt;You could have other criteria, such as similar length, or same domain.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Feb 2023 02:08:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858465#M339183</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2023-02-13T02:08:20Z</dc:date>
    </item>
    <item>
      <title>Re: Compged - Email similarities.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858466#M339184</link>
      <description>&lt;P&gt;You could also add a filter on the output, such as:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  select a.EMAIL, b.EMAIL, compged(lowcase(a.EMAIL), lowcase(b.EMAIL)) as SCORE
  from HAVE a, HAVE b
  where a.EMAIL ne b.EMAIL
    and first(a.EMAIL)=first(b.EMAIL)
  having SCORE &amp;lt; 900;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 13 Feb 2023 02:10:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858466#M339184</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2023-02-13T02:10:06Z</dc:date>
    </item>
    <item>
      <title>Re: Compged - Email similarities.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858869#M339348</link>
      <description>&lt;P&gt;Thank you for that solution&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;It has provided exactly what I am after.&lt;BR /&gt;&lt;BR /&gt;I've noticed the score is being impacted due to the&amp;nbsp;@domain, and is providing false positives as it reading the similarities in this as well.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I created the 'TRIMS' function by Leonid Batkhan to remove the trailing characters after '@' to attempt a workaround, however doesn't appear to working on my end.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Would you have any suggestions to remove trialing characters?&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2023 05:19:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858869#M339348</guid>
      <dc:creator>Sdixon1</dc:creator>
      <dc:date>2023-02-15T05:19:10Z</dc:date>
    </item>
    <item>
      <title>Re: Compged - Email similarities.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858874#M339351</link>
      <description>&lt;P&gt;Using the scan functions should work: scan(email, 1, '@').&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2023 06:09:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compged-Email-similarities/m-p/858874#M339351</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2023-02-15T06:09:57Z</dc:date>
    </item>
  </channel>
</rss>

