Help using Base SAS procedures

Count occurence of each word in a table

Accepted Solution Solved
Reply
Regular Contributor
Posts: 186
Accepted Solution

Count occurence of each word in a table

Hi,

Lets say I have the following table :

COMMENTS
My cat is blue
I like dogs
The cat is blue

I would like to write a query that would count each word in each line. The logic would be that each character string preceded or followed by a space is a word.

The result would be something like :

My 1

cat 2

is 2

blue 2

dogs 1

I 1

The 1

I hope you can help me write this query.

Thank you for your help and time.


Accepted Solutions
Solution
‎01-08-2015 06:27 PM
Respected Advisor
Posts: 4,663

Re: Count occurence of each word in a table

Not a query, but efficient:

data have;

input COMMENTS $80.;

datalines;

My cat is blue

I like dogs

The cat is blue

;

data temp / view=temp;

length word $12;

set have;

do i = 1 by 1 until(missing(word));

    word = upcase(scan(COMMENTS, i));

    if not missing(word) then output;

    end;

keep word;

run;

proc freq data=temp;

table word / noprint out=wordfreq(drop=percent);

run;

or if you want a query, replace last step by:

proc sql;

create table wordcount2 as

select word, count(*) as COUNT

from temp

group by word;

quit;

PG

PG

View solution in original post


All Replies
Solution
‎01-08-2015 06:27 PM
Respected Advisor
Posts: 4,663

Re: Count occurence of each word in a table

Not a query, but efficient:

data have;

input COMMENTS $80.;

datalines;

My cat is blue

I like dogs

The cat is blue

;

data temp / view=temp;

length word $12;

set have;

do i = 1 by 1 until(missing(word));

    word = upcase(scan(COMMENTS, i));

    if not missing(word) then output;

    end;

keep word;

run;

proc freq data=temp;

table word / noprint out=wordfreq(drop=percent);

run;

or if you want a query, replace last step by:

proc sql;

create table wordcount2 as

select word, count(*) as COUNT

from temp

group by word;

quit;

PG

PG
Respected Advisor
Posts: 3,124

Re: Count occurence of each word in a table

Here is another possibility using Hash, also efficient. (Part of code was stolen from PG's post):

data have;

     input COMMENTS $80.;

     datalines;

My cat is blue

I like dogs

The cat is blue

;

data _null_;

     if _n_=1 then

           do;

                declare hash h();

                h.definekey('word');

                h.definedata('word', 'count');

                h.definedone();

           end;

     length word $12;

     count=.;

     set have end=done;

     do i = 1 by 1 until(missing(word));

           word = upcase(scan(COMMENTS, i));

           if not missing(word) then

                do;

                     rc=h.find();

                     if rc=0 then

                           do;

                                count+1;

                                rc=h.replace();

                           end;

                     else

                           do;

                                count=1;

                                rc=h.replace();

                           end;

                end;

     end;

     if done then

           rc=h.output(dataset:'want');

run;

Super Contributor
Posts: 257

Re: Count occurence of each word in a table

Yet another  simplified version of HASH Solution:

data have;

length word $8;

input;

nwords = countw(_infile_, ' ');

do i = 1 to nwords;

   word = scan(_infile_, i, ' ');

   output;

end;

datalines;

My cat is blue

I like dogs

The cat is blue

;

run;

data _null_;

   if _n_ = 1 then do;

      length word $8;

      declare hash h();

      h.definekey('word');

      h.definedata('word', 'count');

      h.definedone();

   end;

   do until(last);

      set have end = last;

      if h.find() ^= 0 then count = 0;

      count + 1;

      h.replace();

   end;

   if last then h.output(dataset:'W_Counts');

run;

proc print data = w_counts;

run;

Trusted Advisor
Posts: 1,300

Re: Count occurence of each word in a table

You could also use hash with SUMINC to keep track of your count.

data have;

    input COMMENTS $80.;

    datalines;

My cat is blue

I like dogs

The cat is blue

;

data want;

    length word $ 12;

    cnt=1;

    declare hash h(suminc:'cnt');

    h.definekey('word');

    h.definedone();

    do until(done);

        set have end=done;

        do _n_=1 by 1 until(missing(word));

            word=scan(comments, _n_);

            h.ref();

        end;

    end;

    rc=h.output(dataset:'foo');

    declare hiter hi('h');

    rc=hi.first();

    do while(rc=0);

        rc=h.sum(sum:count);

        output;

        rc=hi.next();

    end;

    keep word count;

run;

Regular Contributor
Posts: 186

Re: Count occurence of each word in a table

Thank you all for your replies.

PGStats solution works like a charm.

PG one thing I am not sure I fully understand in your code is the 1 by 1 until step...

Is 1 by 1 a required statement for an until loop in the SAS language?

What does 1 by 1 really do in the loop?

Thank you for your help

Respected Advisor
Posts: 4,663

Re: Count occurence of each word in a table

do i = 1 by 1 until(some condition); is the same as do until(some condition); but provides an iteration counter, i, that can be used inside the loop. There is no upper limit on the number of iterations. You can add an upper limit if you want, with the syntax do i = 1 to 10000 by 1 until(some condition); - PG

PG
Regular Contributor
Posts: 186

Re: Count occurence of each word in a table

Thank you both for your replies.

Everything is much clearer now!

Best regards.

Frequent Contributor
Posts: 115

Re: Count occurence of each word in a table

PG is basically using that as a index or in other words count variable to retireve each word in the string one by one as the scan function does scan(string,count) where count can be number, variable or a derived expression. so in your case he is using index variable as count to extract starting from my, then cat and so on.

You could also use something like:

do until(word=' ');

  count+1;

  word = scan(string, count);

   if not missing(word) then output;

  end;

Just different ways to write the code.

Regards,

Naveen

Respected Advisor
Posts: 4,663

Re: Count occurence of each word in a table

You would also need to initialize count=0 before the loop.

PG
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 966 views
  • 7 likes
  • 6 in conversation