<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extract Unigram, Bigram, Trigram etc., from a Text field in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/698805#M213738</link>
    <description>The documentation should always be a reference. Is the examples there not clear?&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.5&amp;amp;docsetId=lestmtsref&amp;amp;docsetTarget=p08do6szetrxe2n136ush727sbuo.htm&amp;amp;locale=en#n1dhnzh7kwwgxcn16qw08x7wc76m" target="_blank"&gt;https://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.5&amp;amp;docsetId=lestmtsref&amp;amp;docsetTarget=p08do6szetrxe2n136ush727sbuo.htm&amp;amp;locale=en#n1dhnzh7kwwgxcn16qw08x7wc76m&lt;/A&gt;</description>
    <pubDate>Fri, 13 Nov 2020 20:29:41 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2020-11-13T20:29:41Z</dc:date>
    <item>
      <title>Extract Unigram, Bigram, Trigram etc., from a Text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697152#M213035</link>
      <description>&lt;P&gt;I have a text field with several sentences. I am trying to do the below -&lt;/P&gt;
&lt;P&gt;1) separate all words -&amp;gt; get frequency counts on each word&lt;/P&gt;
&lt;P&gt;2) separate 2 consecutive words --&amp;gt; get frequency counts on&amp;nbsp;each bigram&lt;/P&gt;
&lt;P&gt;3) separate 3 consecutive words --&amp;gt; get frequency counts on&amp;nbsp;each trigram&lt;/P&gt;
&lt;P&gt;and so on...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think I am only successful in step 1 partially. I am looking for base SAS code that can do the bigram, trigram counts etc.&lt;/P&gt;
&lt;P&gt;Also, I want to use soundex, spedis etc to group them all even there is a misspelling.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anyone can give me some pointers (not looking for the entire solution) to solve this using SAS code?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2020 14:42:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697152#M213035</guid>
      <dc:creator>Venkat4</dc:creator>
      <dc:date>2020-11-06T14:42:21Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Unigram, Bigram, Trigram etc., from a Text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697176#M213043</link>
      <description>&lt;P&gt;It might help to provide some example data an what you expect the result for that example to be.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For one thing a definition of consecutive words for your purpose is needed. Are "words" separated by a comma consecutive? by a period? by some character like @ # $ % consecutive?&lt;/P&gt;
&lt;P&gt;When counting is case to be considered? Would "This street" and "this street" be in the same count?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Another very import bit might be the "an so on". Just how long are your phrases, in terms of your "word" definition?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Use of soundex may well be questionable as you might find cases of a multi-syllable long word matching a soundex result of several short words.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2020 15:40:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697176#M213043</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-11-06T15:40:43Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Unigram, Bigram, Trigram etc., from a Text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697188#M213046</link>
      <description>&lt;P&gt;Thank you!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can make them all upcase or lowcase. Also, I have a stopwords that I want to remove all&amp;nbsp;stop words first.&lt;/P&gt;
&lt;P&gt;Here is an example of what I am looking for on the below example text, all separated by space.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;"Lorem Ipsum text&amp;nbsp;is simply dummy text. "&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unigram - Word and counts -&lt;/P&gt;
&lt;P&gt;Lorem - 1&lt;/P&gt;
&lt;P&gt;Ipsum - 1&lt;/P&gt;
&lt;P&gt;text -2&lt;/P&gt;
&lt;P&gt;simply - 1&lt;/P&gt;
&lt;P&gt;dummy - 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bigram - two words and counts -&lt;/P&gt;
&lt;P&gt;Lorem Ipsum - 1&lt;/P&gt;
&lt;P&gt;Ipsum text - 1&lt;/P&gt;
&lt;P&gt;text simply - 1&lt;/P&gt;
&lt;P&gt;simply text - 1&lt;/P&gt;
&lt;P&gt;dummy text - 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Same way trigram will be derived.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Another goal is to find misspellings of words and group them together (table will have 3 columns - group_word, different_variations, count)&amp;nbsp;so when I search for correct word or phrases in the newer data I can use that group and include all variations instead of only the correct spelling of the word or phrases.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2020 16:31:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697188#M213046</guid>
      <dc:creator>Venkat4</dc:creator>
      <dc:date>2020-11-06T16:31:28Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Unigram, Bigram, Trigram etc., from a Text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697191#M213048</link>
      <description>&lt;P&gt;Shows how to separate the words.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/statgeek/SAS-Tutorials/blob/master/text_analysis.sas" target="_blank"&gt;https://github.com/statgeek/SAS-Tutorials/blob/master/text_analysis.sas&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For bigram/trigram I suggest using an array.&lt;/P&gt;
&lt;P&gt;Here's a tutorial on using Arrays in SAS&lt;BR /&gt;&lt;A href="https://github.com/statgeek/SAS-Tutorials/blob/master/text_analysis.sas" target="_self"&gt;https://stats.idre.ucla.edu/sas/seminars/sas-arrays/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/43935"&gt;@Venkat4&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I have a text field with several sentences. I am trying to do the below -&lt;/P&gt;
&lt;P&gt;1) separate all words -&amp;gt; get frequency counts on each word&lt;/P&gt;
&lt;P&gt;2) separate 2 consecutive words --&amp;gt; get frequency counts on&amp;nbsp;each bigram&lt;/P&gt;
&lt;P&gt;3) separate 3 consecutive words --&amp;gt; get frequency counts on&amp;nbsp;each trigram&lt;/P&gt;
&lt;P&gt;and so on...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think I am only successful in step 1 partially. I am looking for base SAS code that can do the bigram, trigram counts etc.&lt;/P&gt;
&lt;P&gt;Also, I want to use soundex, spedis etc to group them all even there is a misspelling.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anyone can give me some pointers (not looking for the entire solution) to solve this using SAS code?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2020 16:36:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697191#M213048</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-11-06T16:36:15Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Unigram, Bigram, Trigram etc., from a Text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697206#M213054</link>
      <description>&lt;P&gt;Thank you, that was very helpful Reeza!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I used arrays on the numbers mostly and never did on the text fields. The example you gave also used all numbers.&lt;/P&gt;
&lt;P&gt;I will look for arrays with text fields in SAS online, but if you have any simple example I'd like to see so I can expand on that. Thank you again.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2020 17:10:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/697206#M213054</guid>
      <dc:creator>Venkat4</dc:creator>
      <dc:date>2020-11-06T17:10:04Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Unigram, Bigram, Trigram etc., from a Text field</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/698805#M213738</link>
      <description>The documentation should always be a reference. Is the examples there not clear?&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.5&amp;amp;docsetId=lestmtsref&amp;amp;docsetTarget=p08do6szetrxe2n136ush727sbuo.htm&amp;amp;locale=en#n1dhnzh7kwwgxcn16qw08x7wc76m" target="_blank"&gt;https://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.5&amp;amp;docsetId=lestmtsref&amp;amp;docsetTarget=p08do6szetrxe2n136ush727sbuo.htm&amp;amp;locale=en#n1dhnzh7kwwgxcn16qw08x7wc76m&lt;/A&gt;</description>
      <pubDate>Fri, 13 Nov 2020 20:29:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Unigram-Bigram-Trigram-etc-from-a-Text-field/m-p/698805#M213738</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-11-13T20:29:41Z</dc:date>
    </item>
  </channel>
</rss>

