<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pattern Match using SQL or HASH or any other way in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Pattern-Match-using-SQL-or-HASH-or-any-other-way/m-p/254743#M48613</link>
    <description>&lt;P&gt;Update to my previous reply:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was at a Local User Group yesterday and we were talking about preliminary text analysis; I mentioned SOUNDEX, and the presenter said that COMPGED has much better functionality.&amp;nbsp; I’ve never used COMPGED, so decided to dig into it and I must admit – I’m a convert!&amp;nbsp; I wanted to give you updated information so you to can see how cool this is.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I’ve created a dummy data set:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/2198i068631FFEA078DD6/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" alt="IMAGE10.png" title="IMAGE10.png" width="193" height="336" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I want to do is compare the rows in the TEXT column to see how similar the rows are.&amp;nbsp; To do this, I have to join the dataset to itself, and then I want to exclude those rows where the IDs are a match (because it would be the same row compared to itself).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here’s the code:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
 
select a.text, b.text,
compged(a.text, b.text) as Compged1,
soundex(a.text) as Soundex1,
soundex(b.text) as Soundex2
from work.import a, work.import b
where a.id &amp;lt;&amp;gt; b.id;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is a portion of the results:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/2199i575D11894874C35F/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" alt="image11.png" title="image11.png" width="564" height="186" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The lower the COMPGED score, the more similar the sentences.&amp;nbsp; What I find most impressive is that sentences that SOUNDEX says are the same (the first two for example) COMPGED knows there are slight differences, so assigns a score of 100 (This versus Tis) and 200 (test versus taste).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So depending on what you need to do, COMPGED and / or SOUNDEX may be needed.&amp;nbsp; I’d be interested in seeing what you end up using and if you try both, how the results differ!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 05 Mar 2016 16:42:25 GMT</pubDate>
    <dc:creator>DarthPathos</dc:creator>
    <dc:date>2016-03-05T16:42:25Z</dc:date>
    <item>
      <title>Pattern Match using SQL or HASH or any other way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pattern-Match-using-SQL-or-HASH-or-any-other-way/m-p/253335#M48190</link>
      <description>&lt;P&gt;I have two datasets and below is simple example where two datasets exist "dataset a" and "dataset b" now my objective is to do lookup using a pattern string from "dataset b" and look for matches in "dataset a" ...Tried with SQL&amp;nbsp; as below however doesn't seems to work&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PROC SQL;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;CREATE TABLE LOOKUP AS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;SELECT A.*&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;FROM TMS A,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;FILT B&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;WHERE A.DSN LIKE B.DSN;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;QUIT;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Dataset A (variable name is DSN)&lt;BR /&gt;ZZZZ.TESTRV.CATT.LENGTH.WORMTEST&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Dataset B(Variable name is DSN)&lt;BR /&gt;zzzz.TEST%%.CATT.LENGTH&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2016 03:09:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pattern-Match-using-SQL-or-HASH-or-any-other-way/m-p/253335#M48190</guid>
      <dc:creator>rkumar23</dc:creator>
      <dc:date>2016-03-01T03:09:47Z</dc:date>
    </item>
    <item>
      <title>Re: Pattern Match using SQL or HASH or any other way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pattern-Match-using-SQL-or-HASH-or-any-other-way/m-p/253658#M48300</link>
      <description>&lt;P&gt;Hi! &amp;nbsp;Welcome to the community - hopefully you will get a lot out of participating here.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think the solution to your problem is the SOUNDEX function that can be used in PROC SQL. &amp;nbsp;Check out this&amp;nbsp;&lt;A href="http://www2.sas.com/proceedings/sugi29/072-29.pdf" target="_self"&gt;great paper&lt;/A&gt;&amp;nbsp;- I have this paper as a bookmark on my computer because I use it all the time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basically, SOUNDEX takes your string (say McDonald) and converts it to alphanumeric based on the letters / letter combinations. &amp;nbsp;Say for arguement's sake it converts it to M345. &amp;nbsp;If you compared this to MacDonald, you would get a match&amp;nbsp;- the two are interpreted as the same. &amp;nbsp;You'd need to do further comparisons (first name, date of birth, etc) but at least it gets you started. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've written a blog post (using SAS University Edition, but the concepts are identical) - check it out&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/PROC-SQL-Continued-Basic-Text-Analytics-Using-Song-Titles/ta-p/241007" target="_blank"&gt;here&lt;/A&gt;&amp;nbsp;and please let me know if you need further help!!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PROC SQL is a great tool and I'm happy to help out when I can!&lt;/P&gt;
&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 01:54:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pattern-Match-using-SQL-or-HASH-or-any-other-way/m-p/253658#M48300</guid>
      <dc:creator>DarthPathos</dc:creator>
      <dc:date>2016-03-02T01:54:56Z</dc:date>
    </item>
    <item>
      <title>Re: Pattern Match using SQL or HASH or any other way</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pattern-Match-using-SQL-or-HASH-or-any-other-way/m-p/254743#M48613</link>
      <description>&lt;P&gt;Update to my previous reply:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was at a Local User Group yesterday and we were talking about preliminary text analysis; I mentioned SOUNDEX, and the presenter said that COMPGED has much better functionality.&amp;nbsp; I’ve never used COMPGED, so decided to dig into it and I must admit – I’m a convert!&amp;nbsp; I wanted to give you updated information so you to can see how cool this is.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I’ve created a dummy data set:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/2198i068631FFEA078DD6/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" alt="IMAGE10.png" title="IMAGE10.png" width="193" height="336" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I want to do is compare the rows in the TEXT column to see how similar the rows are.&amp;nbsp; To do this, I have to join the dataset to itself, and then I want to exclude those rows where the IDs are a match (because it would be the same row compared to itself).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here’s the code:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
 
select a.text, b.text,
compged(a.text, b.text) as Compged1,
soundex(a.text) as Soundex1,
soundex(b.text) as Soundex2
from work.import a, work.import b
where a.id &amp;lt;&amp;gt; b.id;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is a portion of the results:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/2199i575D11894874C35F/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" alt="image11.png" title="image11.png" width="564" height="186" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The lower the COMPGED score, the more similar the sentences.&amp;nbsp; What I find most impressive is that sentences that SOUNDEX says are the same (the first two for example) COMPGED knows there are slight differences, so assigns a score of 100 (This versus Tis) and 200 (test versus taste).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So depending on what you need to do, COMPGED and / or SOUNDEX may be needed.&amp;nbsp; I’d be interested in seeing what you end up using and if you try both, how the results differ!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 05 Mar 2016 16:42:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pattern-Match-using-SQL-or-HASH-or-any-other-way/m-p/254743#M48613</guid>
      <dc:creator>DarthPathos</dc:creator>
      <dc:date>2016-03-05T16:42:25Z</dc:date>
    </item>
  </channel>
</rss>

