<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Names matching in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Names-matching/m-p/379277#M276895</link>
    <description>&lt;P&gt;proc sql is much more efficient for fuzzy matches.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try this to study the results:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;  
proc sql;
  select cus.COMP 
        ,com.CONM 
        ,comped(cus.COMP,com.CONM) as DISTANCE
  from BOX.CUS
      ,BOX.COM
  where compged(cus.COMP,com.CONM,&amp;amp;maxscore.,'il') &amp;lt; &amp;amp;maxscore.;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Without access to your data, it is difficult to know why your matches are not what you expect.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 26 Jul 2017 06:24:30 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2017-07-26T06:24:30Z</dc:date>
    <item>
      <title>Names matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Names-matching/m-p/379269#M276894</link>
      <description>&lt;P&gt;Hi All&lt;/P&gt;&lt;P&gt;I have to two datasets &amp;nbsp;with one dateset having about 4000 company names and the other with about 3200 companies. I want to merge the two data using similarity in company names. I used the codes below and using the gedscore am only getting few&amp;nbsp;firms with similarilty in names which shouldnt be the case.The gedscore is extremely high for most names matched. Approximately i should get a little over 1000 firms with similarity in names following prior studies after the merge. Please what am i not doing right from my codes hence not getting the right names merge?Is there anything am missing out with my codes.NB:new user. Thanks in advance. Rgds EJAA.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;%let maxscore=999;&lt;/P&gt;&lt;P&gt;data box.namesmerge;&lt;BR /&gt;set box.cus(rename=(comp=bdx)) end=eof1 nobs=nobs1;&lt;BR /&gt;do i = 1 to nobs1;&lt;BR /&gt;set box.com(rename=(conm=comp)) point=i;&lt;BR /&gt;gedscore=compged(bdx,comp,&amp;amp;maxscore,'i' );&lt;BR /&gt;if _n_ &amp;lt; i then do;&lt;BR /&gt;if gedscore &amp;lt; &amp;amp;maxscore then output box.namesmerge;&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;keep bdx comp gedscore tic;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 05:02:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Names-matching/m-p/379269#M276894</guid>
      <dc:creator>EJAA</dc:creator>
      <dc:date>2017-07-26T05:02:24Z</dc:date>
    </item>
    <item>
      <title>Re: Names matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Names-matching/m-p/379277#M276895</link>
      <description>&lt;P&gt;proc sql is much more efficient for fuzzy matches.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try this to study the results:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;  
proc sql;
  select cus.COMP 
        ,com.CONM 
        ,comped(cus.COMP,com.CONM) as DISTANCE
  from BOX.CUS
      ,BOX.COM
  where compged(cus.COMP,com.CONM,&amp;amp;maxscore.,'il') &amp;lt; &amp;amp;maxscore.;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Without access to your data, it is difficult to know why your matches are not what you expect.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 06:24:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Names-matching/m-p/379277#M276895</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2017-07-26T06:24:30Z</dc:date>
    </item>
    <item>
      <title>Re: Names matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Names-matching/m-p/379408#M276896</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/136147"&gt;@EJAA&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Hi All&lt;/P&gt;
&lt;P&gt;I have to two datasets &amp;nbsp;with one dateset having about 4000 company names and the other with about 3200 companies. I want to merge the two data using similarity in company names. I used the codes below and using the gedscore am only getting few&amp;nbsp;firms with similarilty in names which shouldnt be the case.The gedscore is extremely high for most names matched. Approximately i should get a little over 1000 firms with similarity in names following prior studies after the merge. &lt;STRONG&gt;Please what am i not doing right from my codes hence not getting the right names merge?&lt;/STRONG&gt;Is there anything am missing out with my codes.NB:new user. Thanks in advance. Rgds EJAA.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Having done similar projects the problem is likely to be at least 50% data driven. People cam't spell (intentional example).&lt;/P&gt;
&lt;P&gt;So you end needing a process that can match ABC Company to Abc co.&lt;/P&gt;
&lt;P&gt;My generic process in abscence of more dedicated text processing software is to:&lt;/P&gt;
&lt;P&gt;1) create new variables with some things standardized, only single spaces, all upper case, remove punctuation (ABC COMPANY not ABC, CO.), standardize frequent special characters such as &amp;amp; in Simon &amp;amp; Sons with 'and', expand common abbreviates such as Co. , Inc. Ltd. (list varies with project type).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then start match on those somewhat standardized variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One project I worked with &lt;STRONG&gt;after&lt;/STRONG&gt; providing the data entry staff with a list of expected companies and the name to entered had 18 'spellings' for what should have been IBM (3 simple upper case letters). Most entertaining: I&amp;gt;B&amp;gt;M&amp;gt;, most puzzling were those that had to spell out International Business Machines.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 14:37:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Names-matching/m-p/379408#M276896</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-07-26T14:37:55Z</dc:date>
    </item>
  </channel>
</rss>

