<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS Contextual Analysis - terms to keep in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-Contextual-Analysis-terms-to-keep/m-p/242055#M9480</link>
    <description>&lt;P&gt;Update to my own question: I later found this in the CA user guide (chap. 1 page 2)&amp;nbsp;that probably explains this behavior:&lt;/P&gt;&lt;P&gt;"&lt;EM&gt;By default, words that provide little or no value are excluded from analysis. Examples of these words include the articles a, an, and the and conjunctions such as and, or, and but.&lt;/EM&gt; &lt;FONT color="#ff0000"&gt;&lt;FONT color="#0000ff"&gt;&lt;EM&gt;Other terms that are specific to your document collection but provide little or no value are also identified and excluded&lt;/EM&gt;&lt;/FONT&gt;.&lt;/FONT&gt;&lt;FONT color="#000000"&gt;"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;One should be aware of this, since we often use a training set, terms that can be important might end up being excluded because they are not well represented in the training data. &lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 06 Jan 2016 15:31:07 GMT</pubDate>
    <dc:creator>Erik_Zencos</dc:creator>
    <dc:date>2016-01-06T15:31:07Z</dc:date>
    <item>
      <title>SAS Contextual Analysis - terms to keep</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-Contextual-Analysis-terms-to-keep/m-p/209683#M9479</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi, I noticed something strange in SAS CA that I hope someone can clarify for me. In SAS CA you can optionally drag words over to the "Dropped terms" list. In my project, I have still not used this option! However, the output dataset "all_terms_ds" which is automatically generated (contains all the terms in the text), contains a field called "keep" whit values "Y" or "N". Many of the terms have the value "N", and I suspect these terms will be dropped from the analysis. There are no settings in CA where I seem to be able to control this, and I can’t find any logic behind (except that the terms occur only in a few documents, but terms that occur a specific number of times can both have the value Y or N). Anybody know if these terms are actually dropped, and what the rules behind is, and if this rules can be changed?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 28 Aug 2015 14:57:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/SAS-Contextual-Analysis-terms-to-keep/m-p/209683#M9479</guid>
      <dc:creator>Erik_Zencos</dc:creator>
      <dc:date>2015-08-28T14:57:29Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Contextual Analysis - terms to keep</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-Contextual-Analysis-terms-to-keep/m-p/242055#M9480</link>
      <description>&lt;P&gt;Update to my own question: I later found this in the CA user guide (chap. 1 page 2)&amp;nbsp;that probably explains this behavior:&lt;/P&gt;&lt;P&gt;"&lt;EM&gt;By default, words that provide little or no value are excluded from analysis. Examples of these words include the articles a, an, and the and conjunctions such as and, or, and but.&lt;/EM&gt; &lt;FONT color="#ff0000"&gt;&lt;FONT color="#0000ff"&gt;&lt;EM&gt;Other terms that are specific to your document collection but provide little or no value are also identified and excluded&lt;/EM&gt;&lt;/FONT&gt;.&lt;/FONT&gt;&lt;FONT color="#000000"&gt;"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;One should be aware of this, since we often use a training set, terms that can be important might end up being excluded because they are not well represented in the training data. &lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jan 2016 15:31:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/SAS-Contextual-Analysis-terms-to-keep/m-p/242055#M9480</guid>
      <dc:creator>Erik_Zencos</dc:creator>
      <dc:date>2016-01-06T15:31:07Z</dc:date>
    </item>
  </channel>
</rss>

