<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Count the number of a specific word in a character field in PROC SQL in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473932#M121685</link>
    <description>&lt;P&gt;In a dataset with more than a million rows, I have a character field called&amp;nbsp;&lt;EM&gt;SERVICE&amp;nbsp;&lt;/EM&gt;that has a length of 1000 bytes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The value of &lt;EM&gt;SERVICE &lt;/EM&gt;looks like:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;ABC&lt;/FONT&gt;123,XYZ456,BRK000,ENG789,&lt;FONT color="#FF0000"&gt;ABC&lt;/FONT&gt;963,&lt;FONT color="#FF0000"&gt;ABC&lt;/FONT&gt;778&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Basically, 3 characters followed by 3 digits combined. That is the pattern.&lt;/P&gt;&lt;P&gt;There are multiple values like that separated by comma. I've provided a small example above but the total length can go up to 1000 at the most.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to create a new variable called &lt;EM&gt;COUNT_ABC&lt;/EM&gt; which contains a value that represents the number of occurences of the word &lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;ABC&lt;/STRONG&gt; &lt;/FONT&gt;in the field &lt;EM&gt;SERVICE&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;For the above example, the value of &lt;EM&gt;COUNT_ABC&lt;/EM&gt; should be &lt;STRONG&gt;3&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The challenge I'm facing here is to accomplish this in PROC SQL because this is an addition to an existing modeling code that is creating about 800 variables using lots of calculations and aggregations.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I cannot use DATA step.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If the length of the field is less than 200 bytes, I would use the following logic:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
CREATE TABLE FINAL AS
SELECT (LENGTH(SERVICE) - LENGTH(TRANSTRN(SERVICE,"ABC",TRIMN(" "))))/3 AS COUNT_ABC
FROM INPUT;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since TRANSTRN is a SAS character function that truncates the output to 200 bytes, this method won't work in my case where the length can go up to 1000 bytes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm anxious to solve this in PROC SQL and without the use of user/custom defined functions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Maybe there is way to solve this using a PRX function? I'm not an expert in this though.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The following logic removes all &lt;STRONG&gt;ABC&lt;/STRONG&gt;s.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;SELECT PRXCHANGE('s/(?:ABC)/$1/i',-1,SERVICE) AS ONLY_ABC&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So the resulting text is like:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;123,XYZ456,BRK000,ENG789,963,778&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I could find a way to do the opposite(that is to keep only the ABCs and strip off everything else including numbers and commas)&lt;/P&gt;&lt;P&gt;Like:&lt;/P&gt;&lt;P&gt;ABCABCABC&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then I can use the length function to get the count because the maximum length in that case will only be like around 60.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Thu, 28 Jun 2018 04:42:17 GMT</pubDate>
    <dc:creator>kariwarez</dc:creator>
    <dc:date>2018-06-28T04:42:17Z</dc:date>
    <item>
      <title>Count the number of a specific word in a character field in PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473932#M121685</link>
      <description>&lt;P&gt;In a dataset with more than a million rows, I have a character field called&amp;nbsp;&lt;EM&gt;SERVICE&amp;nbsp;&lt;/EM&gt;that has a length of 1000 bytes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The value of &lt;EM&gt;SERVICE &lt;/EM&gt;looks like:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;ABC&lt;/FONT&gt;123,XYZ456,BRK000,ENG789,&lt;FONT color="#FF0000"&gt;ABC&lt;/FONT&gt;963,&lt;FONT color="#FF0000"&gt;ABC&lt;/FONT&gt;778&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Basically, 3 characters followed by 3 digits combined. That is the pattern.&lt;/P&gt;&lt;P&gt;There are multiple values like that separated by comma. I've provided a small example above but the total length can go up to 1000 at the most.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to create a new variable called &lt;EM&gt;COUNT_ABC&lt;/EM&gt; which contains a value that represents the number of occurences of the word &lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;ABC&lt;/STRONG&gt; &lt;/FONT&gt;in the field &lt;EM&gt;SERVICE&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;For the above example, the value of &lt;EM&gt;COUNT_ABC&lt;/EM&gt; should be &lt;STRONG&gt;3&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The challenge I'm facing here is to accomplish this in PROC SQL because this is an addition to an existing modeling code that is creating about 800 variables using lots of calculations and aggregations.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I cannot use DATA step.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If the length of the field is less than 200 bytes, I would use the following logic:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
CREATE TABLE FINAL AS
SELECT (LENGTH(SERVICE) - LENGTH(TRANSTRN(SERVICE,"ABC",TRIMN(" "))))/3 AS COUNT_ABC
FROM INPUT;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since TRANSTRN is a SAS character function that truncates the output to 200 bytes, this method won't work in my case where the length can go up to 1000 bytes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm anxious to solve this in PROC SQL and without the use of user/custom defined functions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Maybe there is way to solve this using a PRX function? I'm not an expert in this though.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The following logic removes all &lt;STRONG&gt;ABC&lt;/STRONG&gt;s.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;SELECT PRXCHANGE('s/(?:ABC)/$1/i',-1,SERVICE) AS ONLY_ABC&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So the resulting text is like:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;123,XYZ456,BRK000,ENG789,963,778&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I could find a way to do the opposite(that is to keep only the ABCs and strip off everything else including numbers and commas)&lt;/P&gt;&lt;P&gt;Like:&lt;/P&gt;&lt;P&gt;ABCABCABC&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then I can use the length function to get the count because the maximum length in that case will only be like around 60.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 04:42:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473932#M121685</guid>
      <dc:creator>kariwarez</dc:creator>
      <dc:date>2018-06-28T04:42:17Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of a specific word in a character field in PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473937#M121688</link>
      <description>&lt;P&gt;Can't you simply use the COUNT function like so?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
	length service $1000;
	infile datalines;
	input service;
	datalines;
ABC123,XYZ456,BRK000,ENG789,ABC963,ABC778
;
run;

proc sql;
	create table want as
	select service, count(service,"ABC") as count_abc
	from have;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 27 Jun 2018 23:43:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473937#M121688</guid>
      <dc:creator>ChrisBrooks</dc:creator>
      <dc:date>2018-06-27T23:43:46Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of a specific word in a character field in PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473968#M121707</link>
      <description>&lt;P&gt;If you reserve a large enough&amp;nbsp;space for the function result, TRANTRN will not truncate:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data input;
call streaminit(99796);
length service $1000;
service = "XYZ123";
do i = 1 to 99;
    service = cats(service,",ABC",put(1000*rand("uniform"),z3.0));
    end;
drop i;
run;

PROC SQL;
CREATE TABLE FINAL(drop=s2) AS
SELECT 
    TRANSTRN(SERVICE,"ABC",TRIMN(" ")) as s2 length=1000,
    (LENGTH(SERVICE) - LENGTH(calculated s2))/3 AS COUNT_ABC
FROM INPUT;
select * from final;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 28 Jun 2018 04:14:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473968#M121707</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-06-28T04:14:11Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of a specific word in a character field in PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473969#M121708</link>
      <description>&lt;P&gt;Well this is really embarrassing. I've been programming in SAS for 5 years and this will haunt me forever.&lt;/P&gt;&lt;P&gt;I was looking at all the wrong places for almost 10 hours. I wouldn't have even thought to look at this function. Completely overlooked it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 04:18:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473969#M121708</guid>
      <dc:creator>kariwarez</dc:creator>
      <dc:date>2018-06-28T04:18:44Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of a specific word in a character field in PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473970#M121709</link>
      <description>&lt;P&gt;So I cannot create an intermediate variable like S2 in your example.&lt;/P&gt;&lt;P&gt;I can't use DROP=S2 because&amp;nbsp;I'm using the FEEDBACK option in PROC SQL and another program scans the log to recreate the entire query. And if you've used FEEDBACK option, you'd notice that the log doesn't print the "CREATE TABLE" part. It only starts printing from the SELECT clause.&lt;/P&gt;&lt;P&gt;My code has lots of loops and macros. The code has like 60 rows of lines but when expanded, it is creating around 800 variables.&lt;/P&gt;&lt;P&gt;I just cannot use any intermediate variable.&lt;/P&gt;&lt;P&gt;Sorry if I didn't make this clear in my question.&lt;/P&gt;&lt;P&gt;There are really a ton of ways to solve this but not a lot with my limitation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But I got the solution from &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32246"&gt;@ChrisBrooks&lt;/a&gt;..&amp;nbsp;To use the COUNT function.&lt;/P&gt;&lt;P&gt;I'm familiar with COUNTW, COUNTC functions.&lt;/P&gt;&lt;P&gt;I always thought COUNT only works the way it works in SQL and never realized it was also a SAS function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 04:45:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473970#M121709</guid>
      <dc:creator>kariwarez</dc:creator>
      <dc:date>2018-06-28T04:45:00Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of a specific word in a character field in PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473971#M121710</link>
      <description>&lt;P&gt;I learned something too, thanks &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32246"&gt;@ChrisBrooks&lt;/a&gt;!&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 04:33:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Count-the-number-of-a-specific-word-in-a-character-field-in-PROC/m-p/473971#M121710</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-06-28T04:33:49Z</dc:date>
    </item>
  </channel>
</rss>

