BookmarkSubscribeRSS Feed
Grumbler
Obsidian | Level 7

searched around but couldn't find what i need.

 

example,

 

string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";

 

i can use prxmatch("m/this|what|need/oi",string);

 

but it only returns the position of the first word.

 

how do i count the all of the words in this string?

 

thanks.

8 REPLIES 8
ChrisBrooks
Ammonite | Level 13

I'm normally a big advocate of regular expressions but this is simpler

 

data _null_;
	string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";
	count=count(string,'this')+count(string,'what')+count(string,'need');
	put count=;

run;
Grumbler
Obsidian | Level 7

but this only works for 3 words.  i have hundreds of keywords that i would like to count.  can't type it all like this.  it's all in macro.

 

thanks.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

" it's all in macro"

There is your problem right there.  Data should be in datasets - that is what they are for.  Once data is in datasets, then you use Base SAS code to analyze that data.  For example, if I had a string in a dataset, I could achieve a count of all words quite simply with two steps:

1) datastep outputs each word of any amount fo strings to one observation per word

2) proc freq the resulting dataset to get a dataset with unique words and their counts within the data

 

Macro is not the place to be doing data processing, it is nothing more than a find/replace system for generating text.

ChrisBrooks
Ammonite | Level 13

In that case you'll need to give us a sample of your keywords, input and output in the form of have and want data sets, because (as @RW9 says) this really should be done in data step.

Ksharp
Super User
data k;
input k $;
cards;
this 
what 
need
;
run;
data have;
string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";
output;
run;

proc sql;
select string,sum(count(string,strip(k),'i')) as n
 from have,k
  group by string;
quit;
Grumbler
Obsidian | Level 7

thanks everyone for the tips.  i guess i should clarify a bit more.  what i have is millions of records of "strings" in one variable.  i have another maybe 10 or 20 lists of key words.  i would like to count each list of key words in the millions of "strings" and see which list has most frequency.  then i will decide how to categorize these strings.  was just wondering if there is a fast way to do that.  thanks.  🙂

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, with no test data to run with I am guessing here but something like:

data biglist;
  length string $2000;
  string="a big dog walks around"; output;
  string="something happened other wise"; output;
  string="this is a wise old string with big connotations"; output;
run;

data words;
  length word $2000;
  word="dog"; output;
  word="big"; output;
  word="wise"; output;
run;

data inter (drop=i string);
  set biglist;
  do i=1 to countw(string," ");
    wrd=scan(string,i," ");
    output;
  end;
run;

proc sql;
  delete from inter 
  where wrd not in (select word from words);
quit;

proc freq data=inter;
  tables wrd / out=want;
run;

You can drop the sql delete and do freq over all the data, then filter the results, might be less resource - you will need to try it.  

Ksharp
Super User
You could try my SQL. Maybe that is not too slowly. A faster way I can think is using Hash Table.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2268 views
  • 0 likes
  • 4 in conversation