<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Seven tricky sentences for NLP and text mining algorithms in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Seven-tricky-sentences-for-NLP-and-text-mining-algorithms/m-p/126256#M9331</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A href="http://www.analyticbridge.com/group/textmining/forum/topics/seven-tricky-sentences-for-nlp-and-text-mining-algorithms" title="http://www.analyticbridge.com/group/textmining/forum/topics/seven-tricky-sentences-for-nlp-and-text-mining-algorithms"&gt;Seven tricky sentences for NLP and text mining algorithms - AnalyticBridge&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-weight: normal;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-weight: normal;"&gt;Posted by Mirko Krivanek on &lt;/SPAN&gt;&lt;A class="active_link" href="http://www.analyticbridge.com/group/textmining" title="http://www.analyticbridge.com/group/textmining"&gt;Text Mining - AnalyticBridge&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I thought that these were very interesting.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;"A land of milk and honey" becomes "A land of Milken Honey" (algorithm trained on Wall Street Journal from the 1980's where Michael Milken was mentioned much more than milk)&lt;/LI&gt;&lt;LI&gt;"She threw up her dinner" vs. "She threw up her hands"&lt;/LI&gt;&lt;LI&gt;"I ate a tomato with salt" vs. "I ate a tomato with my mother" or "I ate a tomato with a fork"&lt;/LI&gt;&lt;LI&gt;Words ending with -ing, e.g. "They were entertaining people"&lt;/LI&gt;&lt;LI&gt;"He washed and dried the dishes", vs. "He drank and smoked cigars" (in the latter case he did not drunk cigars)&lt;/LI&gt;&lt;LI&gt;"The lamb was ready to eat" vs. "Was the lamb hungry and wanting some grass?"&lt;/LI&gt;&lt;LI&gt;Words with multiple meaning (e.g. a bay can be a color, type of window or body of water)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would add to the above 7, words that are often used interchangeably, but are &lt;EM&gt;intended &lt;/EM&gt;to mean two different things. For example in the &lt;A href="https://dec.usaid.gov/"&gt;Development Experience Clearinghouse&lt;/A&gt;, &lt;STRONG&gt;evaluations&lt;/STRONG&gt; are intended to be used to describe documents that analyze either the performance of a project or the impact a project has made on a sector or geographic location. &lt;STRONG&gt;Assessments &lt;/STRONG&gt;are supposed to be documents that analyze the conditions of a particular sector or geographical location &lt;EM&gt;before &lt;/EM&gt;a project or program takes place. And yet, the terms are often used indiscriminately within the documents themselves. A human can look at the document and discern if it is an assessment or an evaluation, but it's very difficult to write rules for the SAS Content Categorization Studio to parse the differences.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What linguistic challenges do others have when writing profile rules or texting mining algorithms?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 04 Mar 2013 00:27:53 GMT</pubDate>
    <dc:creator>JuliaM</dc:creator>
    <dc:date>2013-03-04T00:27:53Z</dc:date>
    <item>
      <title>Seven tricky sentences for NLP and text mining algorithms</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Seven-tricky-sentences-for-NLP-and-text-mining-algorithms/m-p/126256#M9331</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A href="http://www.analyticbridge.com/group/textmining/forum/topics/seven-tricky-sentences-for-nlp-and-text-mining-algorithms" title="http://www.analyticbridge.com/group/textmining/forum/topics/seven-tricky-sentences-for-nlp-and-text-mining-algorithms"&gt;Seven tricky sentences for NLP and text mining algorithms - AnalyticBridge&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-weight: normal;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-weight: normal;"&gt;Posted by Mirko Krivanek on &lt;/SPAN&gt;&lt;A class="active_link" href="http://www.analyticbridge.com/group/textmining" title="http://www.analyticbridge.com/group/textmining"&gt;Text Mining - AnalyticBridge&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I thought that these were very interesting.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;"A land of milk and honey" becomes "A land of Milken Honey" (algorithm trained on Wall Street Journal from the 1980's where Michael Milken was mentioned much more than milk)&lt;/LI&gt;&lt;LI&gt;"She threw up her dinner" vs. "She threw up her hands"&lt;/LI&gt;&lt;LI&gt;"I ate a tomato with salt" vs. "I ate a tomato with my mother" or "I ate a tomato with a fork"&lt;/LI&gt;&lt;LI&gt;Words ending with -ing, e.g. "They were entertaining people"&lt;/LI&gt;&lt;LI&gt;"He washed and dried the dishes", vs. "He drank and smoked cigars" (in the latter case he did not drunk cigars)&lt;/LI&gt;&lt;LI&gt;"The lamb was ready to eat" vs. "Was the lamb hungry and wanting some grass?"&lt;/LI&gt;&lt;LI&gt;Words with multiple meaning (e.g. a bay can be a color, type of window or body of water)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would add to the above 7, words that are often used interchangeably, but are &lt;EM&gt;intended &lt;/EM&gt;to mean two different things. For example in the &lt;A href="https://dec.usaid.gov/"&gt;Development Experience Clearinghouse&lt;/A&gt;, &lt;STRONG&gt;evaluations&lt;/STRONG&gt; are intended to be used to describe documents that analyze either the performance of a project or the impact a project has made on a sector or geographic location. &lt;STRONG&gt;Assessments &lt;/STRONG&gt;are supposed to be documents that analyze the conditions of a particular sector or geographical location &lt;EM&gt;before &lt;/EM&gt;a project or program takes place. And yet, the terms are often used indiscriminately within the documents themselves. A human can look at the document and discern if it is an assessment or an evaluation, but it's very difficult to write rules for the SAS Content Categorization Studio to parse the differences.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What linguistic challenges do others have when writing profile rules or texting mining algorithms?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 04 Mar 2013 00:27:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Seven-tricky-sentences-for-NLP-and-text-mining-algorithms/m-p/126256#M9331</guid>
      <dc:creator>JuliaM</dc:creator>
      <dc:date>2013-03-04T00:27:53Z</dc:date>
    </item>
    <item>
      <title>Re: Seven tricky sentences for NLP and text mining algorithms</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Seven-tricky-sentences-for-NLP-and-text-mining-algorithms/m-p/126257#M9332</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;That's very interesting.&amp;nbsp; Cases like these are why a good training corpus is necessary.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It's funny how #3 and #6 seem fixable with a slight word change.&amp;nbsp; "I ate tomato with salt" and "The lamb was cooked and ready to eat".&amp;nbsp; The others are not as easily fixed. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If the classification rules are difficult to make, perhaps the corpus can be modified.&amp;nbsp; While this is usually never the case, some projects can entertain this as an option.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 04 Mar 2013 16:00:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Seven-tricky-sentences-for-NLP-and-text-mining-algorithms/m-p/126257#M9332</guid>
      <dc:creator>jaredp</dc:creator>
      <dc:date>2013-03-04T16:00:17Z</dc:date>
    </item>
  </channel>
</rss>

