<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fuzzy Matching in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-Matching/m-p/352797#M82280</link>
    <description>&lt;P&gt;Here is a start. &amp;nbsp;I would try changing all to lower case and also consider soundex for misspellings.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards delimiter=',' missover;
length ref $ 1000;
input ref $;
cards;
AstraZeneca Group,
MyGroup Rules
;
run;

data want;
infile cards delimiter=',' missover;  
length ref $ 1000;
input ref $;
cards;
I love the AstraZeneca Group!!!!!  It is so awesome!,
Oh yeah I totally love the AstraZeneca Group they rule!,
Have you ever tried MyGroup Rules?  It is sweeeeeeeet!
;

PROC SQL;
CREATE TABLE JOINS AS 
SELECT HAVE.REF AS REF1, WANT.REF AS REF2 FROM HAVE
LEFT JOIN WANT ON TRIM(WANT.REF) CONTAINS TRIM(HAVE.REF);
RUN;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Mon, 24 Apr 2017 11:59:02 GMT</pubDate>
    <dc:creator>thomp7050</dc:creator>
    <dc:date>2017-04-24T11:59:02Z</dc:date>
    <item>
      <title>Fuzzy Matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-Matching/m-p/352785#M82273</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hopefully somebody can help me. I have 2 data tables. Table one has a singular column that contains a list of approx 10,000 business names such as&amp;nbsp;"AstraZeneca Group", "Zurich Insurance Group", "Virgin Money PLC", etc. Table 2 contains a column that holds free text. From the column in table 2, I want to run a fuzzy match against the data stored in table 1. Therefore, if there are occurrences of&amp;nbsp;&lt;SPAN&gt;"AstraZeneca Group" in table 2 in the free text column then create a new column "AstraZeneca" with a value of 1. Is there a slick and efficient way of doing this that can screen the free text column (in table 2) against a large list of variables? I am using Base SAS.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks for your support.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Regards&lt;BR /&gt;&lt;BR /&gt;Chris&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2017 11:05:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-Matching/m-p/352785#M82273</guid>
      <dc:creator>cmoore</dc:creator>
      <dc:date>2017-04-24T11:05:02Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy Matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-Matching/m-p/352797#M82280</link>
      <description>&lt;P&gt;Here is a start. &amp;nbsp;I would try changing all to lower case and also consider soundex for misspellings.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards delimiter=',' missover;
length ref $ 1000;
input ref $;
cards;
AstraZeneca Group,
MyGroup Rules
;
run;

data want;
infile cards delimiter=',' missover;  
length ref $ 1000;
input ref $;
cards;
I love the AstraZeneca Group!!!!!  It is so awesome!,
Oh yeah I totally love the AstraZeneca Group they rule!,
Have you ever tried MyGroup Rules?  It is sweeeeeeeet!
;

PROC SQL;
CREATE TABLE JOINS AS 
SELECT HAVE.REF AS REF1, WANT.REF AS REF2 FROM HAVE
LEFT JOIN WANT ON TRIM(WANT.REF) CONTAINS TRIM(HAVE.REF);
RUN;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 24 Apr 2017 11:59:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-Matching/m-p/352797#M82280</guid>
      <dc:creator>thomp7050</dc:creator>
      <dc:date>2017-04-24T11:59:02Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy Matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-Matching/m-p/352811#M82288</link>
      <description>&lt;P&gt;There are two parts to your question. First, parsing nouns and noun phrases from text. I think you'll have to go outside of SAS for that part, namely to use one of the Natural Language Processing (NLP) packages. Take a look at:&amp;nbsp;&lt;A href="http://stackoverflow.com/questions/10974532/extracting-noun-phrases-from-a-text-file-using-stanford-typed-parser" target="_blank"&gt;http://stackoverflow.com/questions/10974532/extracting-noun-phrases-from-a-text-file-using-stanford-typed-parser&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As for the second part, I have found the compged function to be the most powerful for such tasks. You can find one&amp;nbsp;example of how you can incorporate it in your code at: &lt;A href="http://www.sascommunity.org/wiki/Expert_Panel_Solution_MWSUG_2013-Tabachneck" target="_blank"&gt;http://www.sascommunity.org/wiki/Expert_Panel_Solution_MWSUG_2013-Tabachneck&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2017 12:30:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-Matching/m-p/352811#M82288</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2017-04-24T12:30:02Z</dc:date>
    </item>
  </channel>
</rss>

