BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nicnad
Fluorite | Level 6

Hi,

Lets say I have the following table :

COMMENTS
My cat is blue
I like dogs
The cat is blue

I would like to write a query that would count each word in each line. The logic would be that each character string preceded or followed by a space is a word.

The result would be something like :

My 1

cat 2

is 2

blue 2

dogs 1

I 1

The 1

I hope you can help me write this query.

Thank you for your help and time.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Not a query, but efficient:

data have;

input COMMENTS $80.;

datalines;

My cat is blue

I like dogs

The cat is blue

;

data temp / view=temp;

length word $12;

set have;

do i = 1 by 1 until(missing(word));

    word = upcase(scan(COMMENTS, i));

    if not missing(word) then output;

    end;

keep word;

run;

proc freq data=temp;

table word / noprint out=wordfreq(drop=percent);

run;

or if you want a query, replace last step by:

proc sql;

create table wordcount2 as

select word, count(*) as COUNT

from temp

group by word;

quit;

PG

PG

View solution in original post

9 REPLIES 9
PGStats
Opal | Level 21

Not a query, but efficient:

data have;

input COMMENTS $80.;

datalines;

My cat is blue

I like dogs

The cat is blue

;

data temp / view=temp;

length word $12;

set have;

do i = 1 by 1 until(missing(word));

    word = upcase(scan(COMMENTS, i));

    if not missing(word) then output;

    end;

keep word;

run;

proc freq data=temp;

table word / noprint out=wordfreq(drop=percent);

run;

or if you want a query, replace last step by:

proc sql;

create table wordcount2 as

select word, count(*) as COUNT

from temp

group by word;

quit;

PG

PG
Haikuo
Onyx | Level 15

Here is another possibility using Hash, also efficient. (Part of code was stolen from PG's post):

data have;

     input COMMENTS $80.;

     datalines;

My cat is blue

I like dogs

The cat is blue

;

data _null_;

     if _n_=1 then

           do;

                declare hash h();

                h.definekey('word');

                h.definedata('word', 'count');

                h.definedone();

           end;

     length word $12;

     count=.;

     set have end=done;

     do i = 1 by 1 until(missing(word));

           word = upcase(scan(COMMENTS, i));

           if not missing(word) then

                do;

                     rc=h.find();

                     if rc=0 then

                           do;

                                count+1;

                                rc=h.replace();

                           end;

                     else

                           do;

                                count=1;

                                rc=h.replace();

                           end;

                end;

     end;

     if done then

           rc=h.output(dataset:'want');

run;

KachiM
Rhodochrosite | Level 12

Yet another  simplified version of HASH Solution:

data have;

length word $8;

input;

nwords = countw(_infile_, ' ');

do i = 1 to nwords;

   word = scan(_infile_, i, ' ');

   output;

end;

datalines;

My cat is blue

I like dogs

The cat is blue

;

run;

data _null_;

   if _n_ = 1 then do;

      length word $8;

      declare hash h();

      h.definekey('word');

      h.definedata('word', 'count');

      h.definedone();

   end;

   do until(last);

      set have end = last;

      if h.find() ^= 0 then count = 0;

      count + 1;

      h.replace();

   end;

   if last then h.output(dataset:'W_Counts');

run;

proc print data = w_counts;

run;

FriedEgg
SAS Employee

You could also use hash with SUMINC to keep track of your count.

data have;

    input COMMENTS $80.;

    datalines;

My cat is blue

I like dogs

The cat is blue

;

data want;

    length word $ 12;

    cnt=1;

    declare hash h(suminc:'cnt');

    h.definekey('word');

    h.definedone();

    do until(done);

        set have end=done;

        do _n_=1 by 1 until(missing(word));

            word=scan(comments, _n_);

            h.ref();

        end;

    end;

    rc=h.output(dataset:'foo');

    declare hiter hi('h');

    rc=hi.first();

    do while(rc=0);

        rc=h.sum(sum:count);

        output;

        rc=hi.next();

    end;

    keep word count;

run;

nicnad
Fluorite | Level 6

Thank you all for your replies.

PGStats solution works like a charm.

PG one thing I am not sure I fully understand in your code is the 1 by 1 until step...

Is 1 by 1 a required statement for an until loop in the SAS language?

What does 1 by 1 really do in the loop?

Thank you for your help

PGStats
Opal | Level 21

do i = 1 by 1 until(some condition); is the same as do until(some condition); but provides an iteration counter, i, that can be used inside the loop. There is no upper limit on the number of iterations. You can add an upper limit if you want, with the syntax do i = 1 to 10000 by 1 until(some condition); - PG

PG
nicnad
Fluorite | Level 6

Thank you both for your replies.

Everything is much clearer now!

Best regards.

naveen_srini
Quartz | Level 8

PG is basically using that as a index or in other words count variable to retireve each word in the string one by one as the scan function does scan(string,count) where count can be number, variable or a derived expression. so in your case he is using index variable as count to extract starting from my, then cat and so on.

You could also use something like:

do until(word=' ');

  count+1;

  word = scan(string, count);

   if not missing(word) then output;

  end;

Just different ways to write the code.

Regards,

Naveen

PGStats
Opal | Level 21

You would also need to initialize count=0 before the loop.

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5856 views
  • 8 likes
  • 6 in conversation