<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: [tm] counting multiple substrings in one string in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386532#M9864</link>
    <description>&lt;P&gt;"&lt;SPAN&gt;&amp;nbsp;it's all in macro"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;There is your problem right there. &amp;nbsp;Data should be in datasets - that is what they are for. &amp;nbsp;Once data is in datasets, then you use Base SAS code to analyze that data. &amp;nbsp;For example, if I had a string in a dataset, I could achieve a count of all words quite simply with two steps:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1) datastep outputs each word of any amount fo strings to one observation per word&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2) proc freq the resulting dataset to get a dataset with unique words and their counts within the data&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Macro&amp;nbsp;&lt;U&gt;is not&lt;/U&gt; the place to be doing data processing, it is nothing more than a find/replace system for generating text.&lt;U&gt;&lt;/U&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 09 Aug 2017 09:44:19 GMT</pubDate>
    <dc:creator>RW9</dc:creator>
    <dc:date>2017-08-09T09:44:19Z</dc:date>
    <item>
      <title>[tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386517#M9861</link>
      <description>&lt;P&gt;searched around but couldn't find what i need.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;example,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;i can use prxmatch("m/this|what|need/oi",string);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;but it only returns the position of the first word.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;how do i count the all of the words in this string?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Aug 2017 08:28:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386517#M9861</guid>
      <dc:creator>Grumbler</dc:creator>
      <dc:date>2017-08-09T08:28:26Z</dc:date>
    </item>
    <item>
      <title>Re: [tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386520#M9862</link>
      <description>&lt;P&gt;I'm normally a big advocate of regular expressions but this is simpler&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
	string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";
	count=count(string,'this')+count(string,'what')+count(string,'need');
	put count=;

run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 09 Aug 2017 08:51:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386520#M9862</guid>
      <dc:creator>ChrisBrooks</dc:creator>
      <dc:date>2017-08-09T08:51:04Z</dc:date>
    </item>
    <item>
      <title>Re: [tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386526#M9863</link>
      <description>&lt;P&gt;but this only works for 3 words.&amp;nbsp; i have hundreds of keywords that i would like to count.&amp;nbsp; can't type it all like this.&amp;nbsp; it's all in macro.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Aug 2017 08:57:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386526#M9863</guid>
      <dc:creator>Grumbler</dc:creator>
      <dc:date>2017-08-09T08:57:33Z</dc:date>
    </item>
    <item>
      <title>Re: [tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386532#M9864</link>
      <description>&lt;P&gt;"&lt;SPAN&gt;&amp;nbsp;it's all in macro"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;There is your problem right there. &amp;nbsp;Data should be in datasets - that is what they are for. &amp;nbsp;Once data is in datasets, then you use Base SAS code to analyze that data. &amp;nbsp;For example, if I had a string in a dataset, I could achieve a count of all words quite simply with two steps:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1) datastep outputs each word of any amount fo strings to one observation per word&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2) proc freq the resulting dataset to get a dataset with unique words and their counts within the data&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Macro&amp;nbsp;&lt;U&gt;is not&lt;/U&gt; the place to be doing data processing, it is nothing more than a find/replace system for generating text.&lt;U&gt;&lt;/U&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Aug 2017 09:44:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386532#M9864</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2017-08-09T09:44:19Z</dc:date>
    </item>
    <item>
      <title>Re: [tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386536#M9865</link>
      <description>&lt;P&gt;In that case you'll need to give us a sample of your keywords, input and output in the form of have and want data sets, because (as&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/45151"&gt;@RW9&lt;/a&gt;&amp;nbsp;says) this really should be done in data step.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Aug 2017 10:14:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386536#M9865</guid>
      <dc:creator>ChrisBrooks</dc:creator>
      <dc:date>2017-08-09T10:14:14Z</dc:date>
    </item>
    <item>
      <title>Re: [tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386645#M9866</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data k;
input k $;
cards;
this 
what 
need
;
run;
data have;
string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";
output;
run;

proc sql;
select string,sum(count(string,strip(k),'i')) as n
 from have,k
  group by string;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 09 Aug 2017 14:12:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386645#M9866</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-08-09T14:12:37Z</dc:date>
    </item>
    <item>
      <title>Re: [tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386852#M9867</link>
      <description>&lt;P&gt;thanks everyone for the tips.&amp;nbsp; i guess i should clarify a bit more.&amp;nbsp; what i have is millions of records of "strings" in one variable.&amp;nbsp; i have another maybe 10 or 20 lists of key words.&amp;nbsp; i would like to count each list of key words in the millions of "strings" and see which list has most frequency.&amp;nbsp; then i will decide how to categorize these strings.&amp;nbsp;&amp;nbsp;was just wondering if there is a&amp;nbsp;fast way to do that.&amp;nbsp; thanks.&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Aug 2017 01:19:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386852#M9867</guid>
      <dc:creator>Grumbler</dc:creator>
      <dc:date>2017-08-10T01:19:49Z</dc:date>
    </item>
    <item>
      <title>Re: [tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386887#M9868</link>
      <description>&lt;P&gt;Well, with no test data to run with I am guessing here but something like:&lt;/P&gt;
&lt;PRE&gt;data biglist;
  length string $2000;
  string="a big dog walks around"; output;
  string="something happened other wise"; output;
  string="this is a wise old string with big connotations"; output;
run;

data words;
  length word $2000;
  word="dog"; output;
  word="big"; output;
  word="wise"; output;
run;

data inter (drop=i string);
  set biglist;
  do i=1 to countw(string," ");
    wrd=scan(string,i," ");
    output;
  end;
run;

proc sql;
  delete from inter 
  where wrd not in (select word from words);
quit;

proc freq data=inter;
  tables wrd / out=want;
run;
&lt;/PRE&gt;
&lt;P&gt;You can drop the sql delete and do freq over all the data, then filter the results, might be less resource - you will need to try it. &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Aug 2017 07:56:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386887#M9868</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2017-08-10T07:56:36Z</dc:date>
    </item>
    <item>
      <title>Re: [tm] counting multiple substrings in one string</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386973#M9869</link>
      <description>You could try my SQL. Maybe that is not too slowly.
A faster way I can think is using Hash Table.</description>
      <pubDate>Thu, 10 Aug 2017 13:01:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/tm-counting-multiple-substrings-in-one-string/m-p/386973#M9869</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-08-10T13:01:52Z</dc:date>
    </item>
  </channel>
</rss>

