searched around but couldn't find what i need.
example,
string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";
i can use prxmatch("m/this|what|need/oi",string);
but it only returns the position of the first word.
how do i count the all of the words in this string?
thanks.
I'm normally a big advocate of regular expressions but this is simpler
data _null_;
string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";
count=count(string,'this')+count(string,'what')+count(string,'need');
put count=;
run;
but this only works for 3 words. i have hundreds of keywords that i would like to count. can't type it all like this. it's all in macro.
thanks.
" it's all in macro"
There is your problem right there. Data should be in datasets - that is what they are for. Once data is in datasets, then you use Base SAS code to analyze that data. For example, if I had a string in a dataset, I could achieve a count of all words quite simply with two steps:
1) datastep outputs each word of any amount fo strings to one observation per word
2) proc freq the resulting dataset to get a dataset with unique words and their counts within the data
Macro is not the place to be doing data processing, it is nothing more than a find/replace system for generating text.
In that case you'll need to give us a sample of your keywords, input and output in the form of have and want data sets, because (as @RW9 says) this really should be done in data step.
data k;
input k $;
cards;
this
what
need
;
run;
data have;
string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";
output;
run;
proc sql;
select string,sum(count(string,strip(k),'i')) as n
from have,k
group by string;
quit;
thanks everyone for the tips. i guess i should clarify a bit more. what i have is millions of records of "strings" in one variable. i have another maybe 10 or 20 lists of key words. i would like to count each list of key words in the millions of "strings" and see which list has most frequency. then i will decide how to categorize these strings. was just wondering if there is a fast way to do that. thanks. 🙂
Well, with no test data to run with I am guessing here but something like:
data biglist; length string $2000; string="a big dog walks around"; output; string="something happened other wise"; output; string="this is a wise old string with big connotations"; output; run; data words; length word $2000; word="dog"; output; word="big"; output; word="wise"; output; run; data inter (drop=i string); set biglist; do i=1 to countw(string," "); wrd=scan(string,i," "); output; end; run; proc sql; delete from inter where wrd not in (select word from words); quit; proc freq data=inter; tables wrd / out=want; run;
You can drop the sql delete and do freq over all the data, then filter the results, might be less resource - you will need to try it.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.