<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Best approach to classifying records based on manually entered text description. in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/713214#M8565</link>
    <description>&lt;P&gt;If you are licensed with SAS Text Miner, then check out the Text Rule Builder node which creates Boolean rules from small subsets of terms to predict a categorical target variable.&amp;nbsp; &lt;SPAN&gt;The&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="xis-windowItem"&gt;Text Rule Builder&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;node generates an ordered set of rules that together are useful in describing and predicting a target variable.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;There is &lt;A href="http://Reference%20Help https://go.documentation.sas.com/?docsetId=tmref&amp;amp;docsetTarget=p0wpl4hhgmrxmdn1kx8euu348kgx.htm&amp;amp;docsetVersion=15.2&amp;amp;locale=en " target="_self"&gt;an example in the SAS Text Miner 15.2&lt;/A&gt; that shows you how to predict a categorical target variable using this node.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also in the following SAS Global Forum paper&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2650-2018.pdf" target="_self"&gt;Classifying and Predicting Spam Messages using Text Mining in&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2650-2018.pdf" target="_self"&gt;SAS® Enterprise Miner™&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;five other predictive models including memory-based reasoning (MBR), logistic regression, decision tree, random forest and neural network were built and their performance was compared with the Text Rule Builder model.&amp;nbsp; The best model is later used to classify and predict the messages as spam and ham (non-spam).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;-Ann&lt;/P&gt;</description>
    <pubDate>Thu, 21 Jan 2021 20:36:20 GMT</pubDate>
    <dc:creator>AnnKuo</dc:creator>
    <dc:date>2021-01-21T20:36:20Z</dc:date>
    <item>
      <title>Best approach to classifying records based on manually entered text description.</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/711503#M8556</link>
      <description>&lt;P&gt;I have been writing traditional SAS code to classify building permit records into whether or not they are new residential construction or demolitions. There are a couple of categorical fields that help, but most of the information comes from a long&amp;nbsp; manually entered text description.&amp;nbsp; Up until now I have been writing code that searches that string for key words and phrases and makes decisions based on those.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am thinking there must be a better approach to this problem using Enterprise Miner or some other tools.&amp;nbsp; Something where I can manually classify some records until some sort of machine learning algorithm is trained to make the decisions for me.&amp;nbsp; I process about 50 files a year with 10,000 to 100,000 records each.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does anybody have a suggestions on how I should approach this?&amp;nbsp; Maybe an paper?&lt;/P&gt;
&lt;P&gt;Thanks!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jan 2021 17:01:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/711503#M8556</guid>
      <dc:creator>CurtisMackWSIPP</dc:creator>
      <dc:date>2021-01-14T17:01:17Z</dc:date>
    </item>
    <item>
      <title>Re: Best approach to classifying records based on manually entered text description.</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/711535#M8559</link>
      <description>&lt;P&gt;I think you'll need text miner to help with processing the text. Another one, if you know the type of words you're looking for is to do a word analysis. So the key words you've identified and just try a basic logistic regression model and that should be your 'baseline'. Any model from there on out should be giving you better results that the most basic. This can very much be an ML problem though.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jan 2021 18:47:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/711535#M8559</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-01-14T18:47:12Z</dc:date>
    </item>
    <item>
      <title>Re: Best approach to classifying records based on manually entered text description.</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/711689#M8561</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;You can visit&amp;nbsp;&lt;A href="https://www.lexjansen.com/" target="_blank"&gt;https://www.lexjansen.com/&lt;/A&gt;&amp;nbsp;to search all the papers related to this topic. Document classification could be end purpose, mainly data management. Another popular usage is to predict. There is a text miner procedure in SAS HPA you can consider. The primary benefit from using that is to better manage intermediate dataset, better than Text Miner. If you come from more open source background approaching SAS, Viya is better for you to start.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Jia&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jan 2021 15:48:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/711689#M8561</guid>
      <dc:creator>fierceanalytics</dc:creator>
      <dc:date>2021-01-15T15:48:15Z</dc:date>
    </item>
    <item>
      <title>Re: Best approach to classifying records based on manually entered text description.</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/713214#M8565</link>
      <description>&lt;P&gt;If you are licensed with SAS Text Miner, then check out the Text Rule Builder node which creates Boolean rules from small subsets of terms to predict a categorical target variable.&amp;nbsp; &lt;SPAN&gt;The&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="xis-windowItem"&gt;Text Rule Builder&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;node generates an ordered set of rules that together are useful in describing and predicting a target variable.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;There is &lt;A href="http://Reference%20Help https://go.documentation.sas.com/?docsetId=tmref&amp;amp;docsetTarget=p0wpl4hhgmrxmdn1kx8euu348kgx.htm&amp;amp;docsetVersion=15.2&amp;amp;locale=en " target="_self"&gt;an example in the SAS Text Miner 15.2&lt;/A&gt; that shows you how to predict a categorical target variable using this node.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also in the following SAS Global Forum paper&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2650-2018.pdf" target="_self"&gt;Classifying and Predicting Spam Messages using Text Mining in&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2650-2018.pdf" target="_self"&gt;SAS® Enterprise Miner™&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;five other predictive models including memory-based reasoning (MBR), logistic regression, decision tree, random forest and neural network were built and their performance was compared with the Text Rule Builder model.&amp;nbsp; The best model is later used to classify and predict the messages as spam and ham (non-spam).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;-Ann&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2021 20:36:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Best-approach-to-classifying-records-based-on-manually-entered/m-p/713214#M8565</guid>
      <dc:creator>AnnKuo</dc:creator>
      <dc:date>2021-01-21T20:36:20Z</dc:date>
    </item>
  </channel>
</rss>

