DATA Step, Macro, Functions and more

How to count the frequency like this?

Accepted Solution Solved
Reply
Contributor
Posts: 49
Accepted Solution

How to count the frequency like this?

Dear All,

I'm doing one text mining project.One observation is one string and one indicator. I need to count the frequency of each word in the string and I will get one new dataset with many columns. The name of one column is the word and the content is the the frequency of the word in one observation. The indicator remains unchanged. How to implent this using sas if I have hundreds of words in one string? Thanks.

Have

stringindicator
a b c1
a c d d a1
a b c2
f g g h2
e f e a m1

Want

abcedfghmindicator
1110000001
2010200001
1110000002
0000012102
1002010011

Tomi


test.JPG

Accepted Solutions
Solution
‎08-05-2013 08:45 AM
Super User
Super User
Posts: 7,060

Re: How to count the frequency like this?

Posted in reply to MikeTurner

You will have trouble if your "words" are not valid SAS variable names.

data tall / view=tall ;

  row + 1;

  set have ;

  do i=1 to countw(string,' ');

    word = scan(string,i,' ');

    output ;

  end;

  keep row indicator word ;

run;

proc summary data=tall nway ;

  by row indicator ;

  class word ;

  output out=tall_count;

run;

proc transpose data=tall_count out=want (drop=_name_);

  by row indicator ;

  var _freq_ ;

  id word ;

run;

proc print; run;

View solution in original post


All Replies
Super Contributor
Posts: 297

Re: How to count the frequency like this?

Posted in reply to MikeTurner

How about this?

DATA STUFF;

  SET HAVE;

  A = 0; B = 0; C = 0; D = 0; E = 0; F = 0; G = 0; H = 0; M = 0;

  ARRAY VARS A B C D E F G H M;

  DO J = 1 TO DIM(VARS);

  DO I = 1 TO LENGTH(COMPRESS(STRING));

  IF VNAME(VARS{J}) = SCAN(STRING,I) THEN VARS{J} = VARS{J}+INDICATOR;

  END;

  END;

  DROP I J;

RUN;

Contributor
Posts: 49

Re: How to count the frequency like this?

Posted in reply to Scott_Mitchell

Thanks. But it will not work if if you have hundreds of words in one string................

Super Contributor
Posts: 297

Re: How to count the frequency like this?

Posted in reply to MikeTurner

You probably should provide a sample that is representative of the data you will be working with.

It appears to work for me without issue for 200 characters.

Respected Advisor
Posts: 3,799

Re: How to count the frequency like this?

Posted in reply to MikeTurner

As long as the words are "SAS Names".

data ind;
   input string &$32. indicator;
   id + 1;
  
cards;
a b c  1
a c d d a  1
a b c  2
f g g h  2
e f e a m  1
;;;;
   run;
data ind2;
   set ind;
   length word $32;
  
do i = 1 by 1;
      word = scan(string,i,
' ');
      if missing(word) then leave;
      output;
     
end;
  
run;
proc summary data=ind2 nway;
  
class id indicator word;
   output out=sum(drop=_type_);
   run;
proc transpose data=sum out=final;
   by id indicator;
   var _freq_;
   id word;
   run;
proc stdize reponly out=poked missing=0;
  
var _numeric_;
   run;
Solution
‎08-05-2013 08:45 AM
Super User
Super User
Posts: 7,060

Re: How to count the frequency like this?

Posted in reply to MikeTurner

You will have trouble if your "words" are not valid SAS variable names.

data tall / view=tall ;

  row + 1;

  set have ;

  do i=1 to countw(string,' ');

    word = scan(string,i,' ');

    output ;

  end;

  keep row indicator word ;

run;

proc summary data=tall nway ;

  by row indicator ;

  class word ;

  output out=tall_count;

run;

proc transpose data=tall_count out=want (drop=_name_);

  by row indicator ;

  var _freq_ ;

  id word ;

run;

proc print; run;

Respected Advisor
Posts: 3,156

Re: How to count the frequency like this?

Posted in reply to MikeTurner

Here is another option using Hash objects.

data have;

input string :&$10.     indicator;

cards;

a b c 1

a c d d a   1

a b c       2

f g g h         2

e f e a m   1

;

data _null_;

  if _n_=1 then do;

  length mvar $ 200;

  retain var ' ';/*This would be your maximum length of your individual word*/

   declare hash h();

   h.definekey('var');

   h.definedata('var');

   h.definedone();

   declare hiter hi('h');

end;

set have end=last;

   do i=1 to countw(string);

     var=scan(string,i);

      rc=h.ref();

      end;

if last then do;

   do rc=hi.first() by 0 while (rc=0);

    mvar=catx(' ',mvar,var);

     rc=hi.next();

   end;

   call symputx('vname',mvar);

end;

 

run;

data want;

  array _var &vname;

  set have;

   do over _var;

     do i=1 to countw(string);

     _var+(vname(_var)=scan(string,i));

       end;

   end;

   output;

   call missing(of _var(*));

   drop string i;

run;


Several things to keep in mind:

1. the max length of macro variable is 64K (65,534) in SAS 9.3, the it works if the total length of all of your distinct words(plus blanks in between) is less than that.

2. the max length of any individual word is 32 char.

3. In current code, case is NOT ignored. It can be easily flipped.

Good luck,

Haikuo

Contributor
Posts: 49

Re: How to count the frequency like this?

Posted in reply to MikeTurner

Thanks a lot for all kind reply. Greatly appreciated.

Mike

Regular Contributor
Posts: 195

Re: How to count the frequency like this?

Posted in reply to MikeTurner

I know it has been marked as ANSWERED...but still here is my try if you like...

data want;

   set have;

   i = 1;

   str = scan(string,i);

   do while(str NE " ");

      j + 1;

      _str_ = str;

        i = i + 1;

        str = scan(string,i);

        output;

   end;

run;

proc sort data = want;

   by string indicator;

run;

proc freq data = want noprint;

   tables _str_ / out = freq(drop = percent);

   by string  indicator;

run;

proc transpose data = freq out = want(drop = _name_ _label_);

   by string indicator;

   var count;

   id _str_;

run;

data want(drop = i);

   set want;

   array temp(*) _numeric_;

   do i = 1 to dim(temp);

     if temp(i) = . then temp(i) = 0;

   end;

run;

-Urvish

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 1085 views
  • 6 likes
  • 6 in conversation