<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Text clustering in SAS in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/262412#M9544</link>
    <description>if you have an academic at your university who teaches with SAS they might have registered one or more courses using AWS-cloud based SAS OnDemand for Academics, and depending on the course it might include SAS OnDemand Enterprise Miner with the text miner add-on. An instructor can upload your data and as a registered OnDemand student you could use the Text Miner.</description>
    <pubDate>Fri, 08 Apr 2016 13:00:15 GMT</pubDate>
    <dc:creator>Damien_Mather</dc:creator>
    <dc:date>2016-04-08T13:00:15Z</dc:date>
    <item>
      <title>Text clustering in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/252683#M9541</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm a student in my last year of university, and I'm working on some analysis for my bachelor's thesis.&lt;BR /&gt;&lt;BR /&gt;I'm analyzing&amp;nbsp;a moderately big dataset (20,000 rows for now, but I only have around 1% of the full data set) that includes a variable for item descriptions (e.g. "Cotton short-sleeved t-shirts"), so the question is - does SAS University (basically base SAS and SAS/STAT) have the capability of clustering text in any meaningful way? I'm looking to create a small amount of categories for these items that wouldn't require me figure out how to categorize everything, and then going through each item to assign a category.&lt;BR /&gt;&lt;BR /&gt;If not, is it possible to get my hands on something like SAS Text Miner for free, or should I be looking for a solution elsewhere?&lt;/P&gt;</description>
      <pubDate>Fri, 26 Feb 2016 08:03:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/252683#M9541</guid>
      <dc:creator>Plikis</dc:creator>
      <dc:date>2016-02-26T08:03:48Z</dc:date>
    </item>
    <item>
      <title>Re: Text clustering in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/253661#M9542</link>
      <description>&lt;P&gt;Hi and welcome to the community! &amp;nbsp;I think there's a lot you can do with SAS University Edition - I use it all the time (and, as a bit of self-promotion, even blog every Friday about it - click on my Avatar to see a list :-)). &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Check out one of my posts &lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/PROC-SQL-Continued-Basic-Text-Analytics-Using-Song-Titles/ta-p/241007" target="_self"&gt;(here)&lt;/A&gt;&amp;nbsp;where I use PROC SQL and what are called "regular expressions" to do basic text analytics on Song Titles. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good luck - I love using PROC SQL and SAS University Edition, so please post back any other problems / questions you have!&lt;/P&gt;
&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 02:00:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/253661#M9542</guid>
      <dc:creator>DarthPathos</dc:creator>
      <dc:date>2016-03-02T02:00:12Z</dc:date>
    </item>
    <item>
      <title>Re: Text clustering in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/254740#M9543</link>
      <description>&lt;P&gt;Update to my previous reply:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was at a Local User Group yesterday and we were talking about preliminary text analysis; I mentioned SOUNDEX, and the presenter&amp;nbsp;said that COMPGED has much better functionality.&amp;nbsp; I’ve never used COMPGED, so decided to dig into it and I must admit – I’m a convert!&amp;nbsp; I wanted to give you updated information so you to can see how cool this is.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I’ve created a dummy data set:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/2195i94CD045AA7B720AC/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" alt="IMAGE10.png" title="IMAGE10.png" width="238" height="414" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I want to do is compare the rows in the TEXT column to see how similar the rows are.&amp;nbsp; To do this, I have to join the dataset to itself, and then I want to exclude those rows where the IDs are a match (because it would be the same row compared to itself).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here’s the code:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
 
select a.text, b.text,
compged(a.text, b.text) as Compged1,
soundex(a.text) as Soundex1,
soundex(b.text) as Soundex2
from work.import a, work.import b
where a.id &amp;lt;&amp;gt; b.id;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is a portion of the results:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/2197i40A3AFBBD2DEDFA8/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" alt="image11.png" title="image11.png" width="597" height="197" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The lower the COMPGED score, the more similar the sentences.&amp;nbsp; What I find most impressive is that sentences that SOUNDEX says are the same (the first two for example) COMPGED knows there are slight differences, so assigns a score of 100 (This versus Tis) and 200 (test versus taste).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So depending on what you need to do, COMPGED and / or SOUNDEX may be needed.&amp;nbsp; I’d be interested in seeing what you end up using and if you try both, how the results differ!&lt;/P&gt;</description>
      <pubDate>Sat, 05 Mar 2016 16:40:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/254740#M9543</guid>
      <dc:creator>DarthPathos</dc:creator>
      <dc:date>2016-03-05T16:40:27Z</dc:date>
    </item>
    <item>
      <title>Re: Text clustering in SAS</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/262412#M9544</link>
      <description>if you have an academic at your university who teaches with SAS they might have registered one or more courses using AWS-cloud based SAS OnDemand for Academics, and depending on the course it might include SAS OnDemand Enterprise Miner with the text miner add-on. An instructor can upload your data and as a registered OnDemand student you could use the Text Miner.</description>
      <pubDate>Fri, 08 Apr 2016 13:00:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Text-clustering-in-SAS/m-p/262412#M9544</guid>
      <dc:creator>Damien_Mather</dc:creator>
      <dc:date>2016-04-08T13:00:15Z</dc:date>
    </item>
  </channel>
</rss>

