<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: fuzzy matching in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/566311#M159136</link>
    <description>&lt;P&gt;I'll see your thousand records and raise you 12,000. &lt;img id="smileywink" class="emoticon emoticon-smileywink" src="https://communities.sas.com/i/smilies/16x16_smiley-wink.png" alt="Smiley Wink" title="Smiley Wink" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Having to see if any of roughly 13,000 in one data set may have been in another data system where names are stored in very different forms. One had the first, last, middle names, things like Junior or II in a single field without any fixed order. And some had two last names related to parents.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 14 Jun 2019 22:11:37 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2019-06-14T22:11:37Z</dc:date>
    <item>
      <title>fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565843#M158926</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have this dataset: I am presenting this table with fname and lname, but both variables should be perfect match if we are going to match it using fuzzy matching because they are from same people with different version.&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;Obs&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;fname&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;lname&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;1&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales De Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales-Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;2&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales De Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales – Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;3&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales De Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;4&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales De Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;MoralesRodriguez&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;5&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales De Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales – De – Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;6&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales-Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales De Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;7&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Morales Rodriguez&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;MoralesDeRodriguez&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using this code to match it.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data final2;
	set work.final;
	delims = ' ,.!–-';
	fname2= compress(fname, delims);
	lname2 =compress(lname, delims);
	score_compged=compged(fname2, lname2, 'INL');
	score2_complev=complev(fname2, lname2, 'INL');
run;

proc print data=final2;
run;

ods rtf close;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Do you have any better code than this one?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Bikash&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2019 13:44:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565843#M158926</guid>
      <dc:creator>bikashten</dc:creator>
      <dc:date>2019-06-13T13:44:27Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565881#M158936</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;but both variables should be perfect match&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;How are you defining a perfect match? A perfect match does not use fuzzy matching at all or COMPGED.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2019 15:09:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565881#M158936</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-06-13T15:09:18Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565913#M158950</link>
      <description>Hi Reeza,&lt;BR /&gt;fname is child last and lname is parent last name, and I am trying to match both of them in order to find whether the child from same parent or not. In the data base, they are stored in some variations as in the above table, but they should be same. I am just trying to match such a way that they are same. I have more than 10000 records like this. Please, let me know if it does make sense or not.</description>
      <pubDate>Thu, 13 Jun 2019 16:23:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565913#M158950</guid>
      <dc:creator>bikashten</dc:creator>
      <dc:date>2019-06-13T16:23:42Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565918#M158954</link>
      <description>If you have 10,000 records I'd be doing the matching manually. There's no easy way to solve this. the general way is first do exact matches, remove those. Then do variations on fuzzy matching, first do compged and take highest scores. Remove those matched.Repeat with a different function - SOUNDEX() or COMPLEV() and try again. And rinse and repeat. It's not fun or easy but doable. There's a SAS tool called Link King that may help - it's free.</description>
      <pubDate>Thu, 13 Jun 2019 16:40:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565918#M158954</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-06-13T16:40:11Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565949#M158977</link>
      <description>That's I am doing right now, but I just posted it if there are any alternative better way to do it. It's not fun to do it manually over thousand records. Thanks, Bikash</description>
      <pubDate>Thu, 13 Jun 2019 17:25:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/565949#M158977</guid>
      <dc:creator>bikashten</dc:creator>
      <dc:date>2019-06-13T17:25:12Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/566311#M159136</link>
      <description>&lt;P&gt;I'll see your thousand records and raise you 12,000. &lt;img id="smileywink" class="emoticon emoticon-smileywink" src="https://communities.sas.com/i/smilies/16x16_smiley-wink.png" alt="Smiley Wink" title="Smiley Wink" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Having to see if any of roughly 13,000 in one data set may have been in another data system where names are stored in very different forms. One had the first, last, middle names, things like Junior or II in a single field without any fixed order. And some had two last names related to parents.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2019 22:11:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/566311#M159136</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-06-14T22:11:37Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/566322#M159143</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/136483"&gt;@bikashten&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;That's I am doing right now, but I just posted it if there are any alternative better way to do it. It's not fun to do it manually over thousand records. Thanks, Bikash&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Agreed, but you can't always program your way out of bad data and it's better to fix this at the sources somehow,using a number to identify companies instead of names is a starter, having a cleaned data base, having a verification step as people enter data.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Trying to clean up the mess afterwards is always more work.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2019 23:00:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/566322#M159143</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-06-14T23:00:14Z</dc:date>
    </item>
  </channel>
</rss>

