<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: count occurences of specific words in a separate table in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426019#M104946</link>
    <description>&lt;P&gt;What if the word "dog" appears twice in the same line?&lt;/P&gt;</description>
    <pubDate>Tue, 09 Jan 2018 10:12:12 GMT</pubDate>
    <dc:creator>PeterClemmensen</dc:creator>
    <dc:date>2018-01-09T10:12:12Z</dc:date>
    <item>
      <title>count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426016#M104945</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have two tables. Table 1 looks like this :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;cat&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;dog&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;lion&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;hamster&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;bear&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;guinea pig&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And table 2 looks like this :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;my dog is stupid&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;that cat likes my dog&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;fat hamster&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;bear are dangerous&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;lions are dangerous&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;lions eat antelopes&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;I like cats and lions&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;giraffes are cute&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I'd like as a result is to have a count of table 1 words occurences in table 2. To make myself clear, something like this :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;&lt;STRONG&gt;words of &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;table 1&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;occurrences in table 2&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;cat&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;dog&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;lion&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;hamster&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;bear&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;guinea pig&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You will notice that plural forms of words, like "lions" for instance, are counted. In fact, we're looking for the chain of character "lion" in table 2, not the exact word.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've been trying to split the strings in table 2 into words with the scan function, and running some proc freq and merges, but I'm not going anywhere.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Don't hesitate to ask any question. English is not my mother language so maybe I'm not being very clear.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2018 10:10:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426016#M104945</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2018-01-09T10:10:16Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426019#M104946</link>
      <description>&lt;P&gt;What if the word "dog" appears twice in the same line?&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2018 10:12:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426019#M104946</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2018-01-09T10:12:12Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426020#M104947</link>
      <description>Sorry, it would count as a single occurence &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Tue, 09 Jan 2018 10:17:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426020#M104947</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2018-01-09T10:17:06Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426022#M104948</link>
      <description>&lt;P&gt;An SQL approach&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1;
input animals$50.;
datalines;
cat
dog
lion
hamster
bear
guinea pig
;

data have2;
input sentence$200.;
infile datalines truncover;
datalines;
my dog is stupid
that cat likes my dog
fat hamster
bear are dangerous
lions are dangerous
lions eat antelopes
I like cats and lions
giraffes are cute
;

proc sql;
	create table want as
	select animals
		  ,sum(ifn(find(sentence, strip(animals))&amp;gt;0, 1, 0)) as occurences
	from have1, have2
	group by animals
	order by calculated occurences desc;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2018 11:07:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426022#M104948</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2018-01-09T11:07:57Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426036#M104949</link>
      <description>&lt;P&gt;Thanks a lot. It works with the small data I've given as an example. Sadly, when I use it on the tables I'm working with (table 1 contains 1.7 millions observations and table 2, 7k observations), I get a "sort execution failure" error. I'm now trying to deal with this...I've found it could be a memory issue.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2018 11:43:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426036#M104949</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2018-01-09T11:43:35Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426060#M104954</link>
      <description>&lt;P&gt;Using a hash-object and an iterator:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1;
input animal $50.;

occurences = 0;

datalines;
cat
dog
lion
hamster
bear
guinea pig
;
run;

data have2;
input sentence $200.;
infile datalines truncover;

datalines;
my dog is stupid
that cat likes my dog
fat hamster
bear are dangerous
lions are dangerous
lions eat antelopes
I like cats and lions
giraffes are cute
;
run;

data _null_;
   if 0 then set work.have1;

   set work.have2 end= jobDone;

   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have1', multidata: 'yes');
      declare hiter iter('h');
      h.defineKey('animal');
      h.defineData('animal', 'occurences');
      h.defineDone();
   end;

   rc = iter.first();

   do while (rc = 0);
      if find(sentence, animal, 'it') then do;
         occurences = occurences + 1;
         h.replace();         
      end;

      rc = iter.next();
   end;

   if jobdone then do;
      h.output(dataset: 'work.want');
   end;
run;

proc sort data=work.want;
   by descending occurences;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Without the not-exact-match requirement something like the 5th example in &lt;A href="http://support.sas.com/documentation/cdl/en/lecompobjref/69740/HTML/default/viewer.htm#p00ilfw5pzcjvtn1nfya9863fozd.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/lecompobjref/69740/HTML/default/viewer.htm#p00ilfw5pzcjvtn1nfya9863fozd.htm&lt;/A&gt; could be created - faster and less code.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2018 12:52:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426060#M104954</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2018-01-09T12:52:15Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426069#M104957</link>
      <description>&lt;P&gt;This program does not attempt a Cartesian join in memory, so it may be more efficient:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Editted modification.&amp;nbsp; I overlooked the fact that you wanted frequencies in descending order, with just 2 columns.&amp;nbsp; Here's a revised program using the same approach:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1;
input animals $50.;
datalines;
cat
dog
lion
hamster
bear
guinea pig
;

data have2;
input sentence $200.;
infile datalines truncover;
datalines;
my dog is stupid
that cat likes my dog
fat hamster
bear are dangerous
lions are dangerous
lions eat antelopes
I like cats and lions
giraffes are cute
;
proc sql noprint;
  select max(length(trim(animals))) into :maxlen from have1;
  select quote(trim(animals))       into :val_list separated by "," from have1;
quit;
%put _user_;

data want (keep=animals freq);
  array n_   {&amp;amp;sqlobs} ;
  array vals {&amp;amp;sqlobs} $&amp;amp;maxlen (&amp;amp;val_list);

  set have2 end=eoh;
  do w=1 to &amp;amp;sqlobs;
    n_{w} + (find(sentence, strip(vals{w}))&amp;gt;0);
  end;

  if eoh;
  do i=1 to &amp;amp;sqlobs;
    ix=whichn(max(of n_{*}),of n_{*});
    animals=vals{ix};
    freq=n_{ix};
    output;
    n_{ix}=.;
  end; 
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2018 13:36:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426069#M104957</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-01-09T13:36:06Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426094#M104969</link>
      <description>&lt;P&gt;I don't understand&lt;/P&gt;
&lt;P&gt;"&lt;SPAN&gt;In fact, we're looking for the chain of character "lion" in table 2, not the exact word."&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1;
input animals$50.;
datalines;
cat
dog
lions
hamster
bear
guinea pig
;

data have2;
input sentence$200.;
infile datalines truncover;
datalines;
my dog is stupid
that cat likes my dog
fat hamster
bear are dangerous
lions are dangerous
lions eat antelopes
I like cats and lions
giraffes are cute
;

data _null_;
 if _n_=1 then do;
   if 0 then set have1;
   declare hash h(dataset:'have1');
   declare hiter hi('h');
   h.definekey('animals');
   h.definedone();

   declare hash hh();
   hh.definekey('animals');
   hh.definedata('animals','n');
   hh.definedone();
 end;
set have2 end=last;
do while(hi.next()=0);
 if find(sentence,strip(animals)) then do;
  if hh.find()=0 then do;n=n+1;hh.replace(); end;
   else do;n=1;hh.add();end;
 end;
end;

if last then hh.output(dataset:'want');
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 09 Jan 2018 13:54:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426094#M104969</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-01-09T13:54:32Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426431#M105057</link>
      <description>It works with small data tables, but my SAS session still crashes when I try to run it on the 1mil+ observations tables.</description>
      <pubDate>Wed, 10 Jan 2018 13:27:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426431#M105057</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2018-01-10T13:27:35Z</dc:date>
    </item>
    <item>
      <title>Re: count occurences of specific words in a separate table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426433#M105059</link>
      <description>This one too is working with small data tables, but makes my SAS session crash when I try to run it on the 1mil+ observations tables. Sorry everyone, I guess it's just a memory problem. I can run simple data step on the mentionned tables, but your programs makes my session crash, for some reason.</description>
      <pubDate>Wed, 10 Jan 2018 13:30:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/count-occurences-of-specific-words-in-a-separate-table/m-p/426433#M105059</guid>
      <dc:creator>alex_philby</dc:creator>
      <dc:date>2018-01-10T13:30:33Z</dc:date>
    </item>
  </channel>
</rss>

