<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: count occurences of multiple words in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/423097#M104027</link>
    <description>&lt;P&gt;To understand the purpose of that DO group, consider this sentence:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;it makes it difficult&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Without that code, both of these pairs would be output:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;it makes (the first and second words)&lt;/P&gt;
&lt;P&gt;makes it (the second and third words)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sorting with NODUPKEY would still fail to remove either of them.&amp;nbsp; So that DO group switches the value of WORD1 and WORD2, and makes WORD1 the word that alphabetizes first (and WORD2 the word that alphabetizes second).&amp;nbsp; So the result comes out as:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;it makes&lt;/P&gt;
&lt;P&gt;it makes&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then NODUPKEY finds the duplicates and removes one.&lt;/P&gt;</description>
    <pubDate>Thu, 21 Dec 2017 14:19:18 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2017-12-21T14:19:18Z</dc:date>
    <item>
      <title>count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422695#M103936</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let's say I have the following table :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT face="Liberation Serif"&gt;dog is red&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT face="Liberation Serif"&gt;I love that red dog&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT face="Liberation Serif"&gt;hot dog&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT face="Liberation Serif"&gt;red cat&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT face="Liberation Serif"&gt;that cat hates a red dog&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'd like to have a count of occurences of every two word groups, something that would look like this :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;two words group&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;count&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;red+dog&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;that+dog&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;red+cat&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have managed to cout occurences of each word, thanks to another thread, but I have no idea how to do this.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Dec 2017 14:43:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422695#M103936</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2017-12-20T14:43:43Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422705#M103937</link>
      <description>&lt;P&gt;Do you want to set those two word groups yourself or a list of all possible two word groups from the sentences above?&lt;/P&gt;</description>
      <pubDate>Wed, 20 Dec 2017 14:59:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422705#M103937</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2017-12-20T14:59:14Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422713#M103938</link>
      <description>&lt;P&gt;One issue to consider:&amp;nbsp; what if the incoming string repeats a word?&amp;nbsp; Should both instances of the same word be used, doubling the number of pairs?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The approach below counts double, when the word appears twice.&amp;nbsp; You can program around that, so we can come back to that if necessary.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data pairs;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;if countw(string) &amp;gt; 1;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#ff0000"&gt;obsno = _n_;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;do j=1 to countw(string) - 1;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; do k=j+1 to countw(string);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; word1 = scan(string, j);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; word2 = scan(string, k);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if word1 &amp;gt; word2 then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; temp = word2;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; word2 = word1;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; word1 = temp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;keep word1 word2 &lt;FONT color="#ff0000"&gt;obsno&lt;/FONT&gt;;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#ff0000"&gt;proc sort data=pairs nodupkey;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#ff0000"&gt;by obsno word1 word2;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#ff0000"&gt;run;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc freq data=pairs;&lt;/P&gt;
&lt;P&gt;tables word1 * word2 / list;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#ff0000"&gt;*** EDITED to remove duplicate pairs within the same observation.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Dec 2017 16:07:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422713#M103938</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2017-12-20T16:07:10Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422740#M103942</link>
      <description>I want a list of every possible two words groups.</description>
      <pubDate>Wed, 20 Dec 2017 15:45:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422740#M103942</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2017-12-20T15:45:13Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422742#M103943</link>
      <description>&lt;P&gt;Are there specific words you're looking for or all two word combinations, not including duplicates? It seems the order doesn't matter, which is slightly different than an n-gram? Are you including words such as I, the is (articles and conjunctions)?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Dec 2017 15:45:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422742#M103943</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-12-20T15:45:55Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422747#M103944</link>
      <description>&lt;HR /&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;Astounding wrote:&lt;/P&gt;&lt;P&gt;One issue to consider:&amp;nbsp; what if the incoming string repeats a word?&amp;nbsp; Should both instances of the same word be used, doubling the number of pairs?&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Theoretically, I would not want both instances of the same word to be used. But as a matter of fact, there is no occurrences where the string repeats a word in my table, except for small linking words (like "of" for example). I just have to ignore all these small words with a quick data step. So your code works great for me ! Thank you.&lt;/P&gt;&lt;P&gt;For other users though, it might be useful to find a solution to this issue.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Dec 2017 15:53:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422747#M103944</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2017-12-20T15:53:08Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422749#M103946</link>
      <description>Yes, all two words combinations. Not including duplicates and no articles and conjunctions would be better, as stated above.</description>
      <pubDate>Wed, 20 Dec 2017 15:55:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422749#M103946</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2017-12-20T15:55:47Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422759#M103947</link>
      <description>&lt;P&gt;I'll leave the articles and conjuctions part up to you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's one approach. I split the text into individual words, and then use SQL to find all two word combinations.&amp;nbsp;&lt;BR /&gt;Note that this method makes it easy to find any n-gram or join with other lookup tables for sentiment analysis type work.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;*Create sample data;
data random_sentences;
	infile cards truncover;
	informat sentence $256.;
	input sentence $256.;
	cards;
This is a random sentence
This is another random sentence
Happy Birthday
My job sucks.
This is a good idea, not.
This is an awesome idea!
How are you today?
Does this make sense?
Have a great day!
;

;
;
;
*Partition into words;
data f1;
	set random_sentences;
	id=_n_;
	nwords=countw(sentence);
	nchar=length(compress(sentence));

	do word_order=1 to nwords;
		word=scan(sentence, word_order);
		output;
	end;
run;

proc sql;
	create table words2 as
		select t1.sentence, lowcase(t1.word) as word1, lowcase(t2.word) as word2
			from f1 as t1
				cross join f1 as t2
			where t1.sentence=t2.sentence 
				and t1.word_order &amp;gt; t2.word_order
			order by t1.sentence, t1.word_order;
quit;

proc freq data=words2 noprint order=freq;
	table word1*word2 /list out=want;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 20 Dec 2017 16:13:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/422759#M103947</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-12-20T16:13:43Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/423027#M104006</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;P&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if word1 &amp;gt; word2 then do;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; temp = word2;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; word2 = word1;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; word1 = temp;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It works great, but could you please explain the meaning of this part of the code in the loop ?&lt;/P&gt;</description>
      <pubDate>Thu, 21 Dec 2017 10:14:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/423027#M104006</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2017-12-21T10:14:34Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/423044#M104010</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input string:$50.;
infile datalines dlm=',';
datalines;
dog is red
I love that red dog
hot dog
red cat
that cat hates a red dog
;

data help(keep=word);
	set have;
	nWords=countw(string);
	do i=1 to nWords;
		word=scan(string, i);
		output;
	end;
run;

proc sort data=help nodupkey;
	by word;
run;

proc transpose data=help out=help_wide(drop=_NAME_) prefix=word;
	var word;
run;

data help2(keep=word1 word2);
	set help_wide;
	array words{*} $ word1-word10;
	ncomb=comb(dim(words), 2);
	do i=1 to ncomb;
		twoWords=allcomb(i, 2, of words[*]);
		output;
	end;
run;

proc sql;
	create table want as
	select a.*
		  ,sum(case when (findw(strip(string), strip(word1)) &amp;amp; 
					      findw(strip(string), strip(word2))) then 1
		   else 0 end) as Count
	from help2 as a, have as b
	group by word1, word2
	having calculated count &amp;gt; 0
	order by calculated Count desc;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 21 Dec 2017 10:51:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/423044#M104010</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2017-12-21T10:51:15Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of multiple words</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/423097#M104027</link>
      <description>&lt;P&gt;To understand the purpose of that DO group, consider this sentence:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;it makes it difficult&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Without that code, both of these pairs would be output:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;it makes (the first and second words)&lt;/P&gt;
&lt;P&gt;makes it (the second and third words)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sorting with NODUPKEY would still fail to remove either of them.&amp;nbsp; So that DO group switches the value of WORD1 and WORD2, and makes WORD1 the word that alphabetizes first (and WORD2 the word that alphabetizes second).&amp;nbsp; So the result comes out as:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;it makes&lt;/P&gt;
&lt;P&gt;it makes&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then NODUPKEY finds the duplicates and removes one.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Dec 2017 14:19:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-multiple-words/m-p/423097#M104027</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2017-12-21T14:19:18Z</dc:date>
    </item>
  </channel>
</rss>

