<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Match text strings in seperate datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331658#M271946</link>
    <description>&lt;P&gt;I have a dataset with&amp;nbsp;a list&amp;nbsp;of organization, company names (dataset A) such as walmart, AT &amp;amp; T, Children's Place, Target, a dataset with popular people names (dataset B), and a dataset consisting of text like "Harry met Sally at Walmart." (dataset C)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can I match text string "Walmart" from dataset C with company name in dataset A and replace it with "comp_name"?&lt;/P&gt;&lt;P&gt;I'd also like to match and replace "Harry" and "Sally" from dataset C with "people_name".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Fri, 10 Feb 2017 20:04:14 GMT</pubDate>
    <dc:creator>vasasuser</dc:creator>
    <dc:date>2017-02-10T20:04:14Z</dc:date>
    <item>
      <title>Match text strings in seperate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331658#M271946</link>
      <description>&lt;P&gt;I have a dataset with&amp;nbsp;a list&amp;nbsp;of organization, company names (dataset A) such as walmart, AT &amp;amp; T, Children's Place, Target, a dataset with popular people names (dataset B), and a dataset consisting of text like "Harry met Sally at Walmart." (dataset C)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can I match text string "Walmart" from dataset C with company name in dataset A and replace it with "comp_name"?&lt;/P&gt;&lt;P&gt;I'd also like to match and replace "Harry" and "Sally" from dataset C with "people_name".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 20:04:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331658#M271946</guid>
      <dc:creator>vasasuser</dc:creator>
      <dc:date>2017-02-10T20:04:14Z</dc:date>
    </item>
    <item>
      <title>Re: Match text strings in seperate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331665#M271947</link>
      <description>&lt;P&gt;Do you&amp;nbsp;need to account for :&lt;/P&gt;
&lt;P&gt;Wal-Mart vs walmart vs WALMART vs Wal Mart vs wal mart&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please post more sample code and data if you want code for help.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Otherwise, consider searching on here using the terms "Fuzzy matching"&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 20:22:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331665#M271947</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-02-10T20:22:27Z</dc:date>
    </item>
    <item>
      <title>Re: Match text strings in seperate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331673#M271948</link>
      <description>&lt;P&gt;Also what would the final result look like?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do any of your company names happen to be peoples names? I can think of multiple stores where all or part of the name could show up in your person name data. How do intend to handle that?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 20:31:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331673#M271948</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-02-10T20:31:31Z</dc:date>
    </item>
    <item>
      <title>Re: Match text strings in seperate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331708#M271949</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;all&amp;nbsp;for replies and good questions that inspire me to think more.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First,&amp;nbsp;I believe this is a fuzzy string match that ignore cases, blanks, etc.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, I think it's better to produce result like this:&lt;/P&gt;&lt;P&gt;Original text: "Harry met Sally at Walmart."&lt;/P&gt;&lt;P&gt;Editted text: "edit_name met edit_name at edit_name."&lt;/P&gt;&lt;P&gt;since there are cases when company names show up in person names.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The basic idea is to remove any people name, organization/company name, and ideally, location (street names or suffix/abbreation) from a sample text file. I built up a dictionary myself by compiling an excel file of company names and people names from census and other online sources, and then set up a sample text dataset with 10 records, 5 of which carry names and locations, 5 of which don't. I'd like to match the sample text dataset with my dictionaries and then replace them with "edit_name".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help on that?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 21:22:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331708#M271949</guid>
      <dc:creator>vasasuser</dc:creator>
      <dc:date>2017-02-10T21:22:21Z</dc:date>
    </item>
    <item>
      <title>Re: Match text strings in seperate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331711#M271950</link>
      <description>&lt;P&gt;Quick way then, if you have a database to look into.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Split the file into word components, using SCAN function to separate out the strings.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Result:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SentenceID Word&lt;/P&gt;
&lt;P&gt;1 Harry&lt;/P&gt;
&lt;P&gt;1 met&lt;/P&gt;
&lt;P&gt;1 Sally&lt;/P&gt;
&lt;P&gt;1 at&lt;/P&gt;
&lt;P&gt;1 Walmart&lt;/P&gt;
&lt;P&gt;2 Jack&lt;/P&gt;
&lt;P&gt;2 and&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2 Jill&lt;/P&gt;
&lt;P&gt;2 Went&lt;/P&gt;
&lt;P&gt;2 Up&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2 a&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2 hill&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then merge this against your databases and categorize the components and then reform the data into sentences after if desired.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, if you have some money look into the Text Analytics available in SAS Enterprise Miner.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 21:26:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331711#M271950</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-02-10T21:26:53Z</dc:date>
    </item>
    <item>
      <title>Re: Match text strings in seperate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331889#M271951</link>
      <description>Rather than use the concept MERGE, think of solutions to this problem as a JOIN...&lt;BR /&gt;That leads you to PROC SQL with fuzzy logic in the ON conditions of the join</description>
      <pubDate>Sat, 11 Feb 2017 23:39:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Match-text-strings-in-seperate-datasets/m-p/331889#M271951</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2017-02-11T23:39:54Z</dc:date>
    </item>
  </channel>
</rss>

