08-09-2017 04:28 AM
searched around but couldn't find what i need.
string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj";
i can use prxmatch("m/this|what|need/oi",string);
but it only returns the position of the first word.
how do i count the all of the words in this string?
08-09-2017 04:51 AM
I'm normally a big advocate of regular expressions but this is simpler
data _null_; string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj"; count=count(string,'this')+count(string,'what')+count(string,'need'); put count=; run;
08-09-2017 05:42 AM - edited 08-09-2017 05:44 AM
" it's all in macro"
There is your problem right there. Data should be in datasets - that is what they are for. Once data is in datasets, then you use Base SAS code to analyze that data. For example, if I had a string in a dataset, I could achieve a count of all words quite simply with two steps:
1) datastep outputs each word of any amount fo strings to one observation per word
2) proc freq the resulting dataset to get a dataset with unique words and their counts within the data
Macro is not the place to be doing data processing, it is nothing more than a find/replace system for generating text.
08-09-2017 06:14 AM
In that case you'll need to give us a sample of your keywords, input and output in the form of have and want data sets, because (as @RW9 says) this really should be done in data step.
08-09-2017 10:12 AM
data k; input k $; cards; this what need ; run; data have; string="lksadfjlkjthisiswhatineed thisiswhat ineedlkaflkasfdlkj"; output; run; proc sql; select string,sum(count(string,strip(k),'i')) as n from have,k group by string; quit;
08-09-2017 09:19 PM
thanks everyone for the tips. i guess i should clarify a bit more. what i have is millions of records of "strings" in one variable. i have another maybe 10 or 20 lists of key words. i would like to count each list of key words in the millions of "strings" and see which list has most frequency. then i will decide how to categorize these strings. was just wondering if there is a fast way to do that. thanks.
08-10-2017 03:56 AM
Well, with no test data to run with I am guessing here but something like:
data biglist; length string $2000; string="a big dog walks around"; output; string="something happened other wise"; output; string="this is a wise old string with big connotations"; output; run; data words; length word $2000; word="dog"; output; word="big"; output; word="wise"; output; run; data inter (drop=i string); set biglist; do i=1 to countw(string," "); wrd=scan(string,i," "); output; end; run; proc sql; delete from inter where wrd not in (select word from words); quit; proc freq data=inter; tables wrd / out=want; run;
You can drop the sql delete and do freq over all the data, then filter the results, might be less resource - you will need to try it.