I am using the prxmatch function to search for words in a variable. However, I got results that are not perfect match. In particular, I was trying to search for the word "tree", but I also got "street". How do I get a perfect match, i.e., to exclude results that have words that include words that I am actually searching for?
Thanks.
It's hard to say without knowing exactly what you're searching for or what you're data might have, but for the example you provided something like prxmatch('/\btre\b/i', my_text) should fix the issue. The \b signifies a word boundary. If you just had a space, you would not find any "tree" values where "tree" starts the string.
Thanks for the reply. Below is the code that I had written and trying to find the cases that broken tree is involved:
data spct.tree;
set treedata;
if prxmatch("m/trees|limbs|branches/oi", combined_description) > 0 then tree=1;
else tree=0;
run;
However, like I said, the results include cases that show something like "a man walking on the street" because "tree" is part of the word street. Since I have multiple words, as you can see, that I put in to search that I think is related to tree, how or where do I add the "\b" option so to take care of the problem? Thanks and truly appreciate your help.
Again, this might not do everything you want depending on your data, but this is how you'd incorporate the word boundary into your code:
if prxmatch("m/\btrees\b|\blimbs\b|\bbranches\b/oi", combined_description) > 0 then tree=1;
Note that you can also get the 0, 1 boolean you want by using:
tree = prxmatch(....) > 0;
To match singular and plural words, you could use "m/\b(trees?|limbs?|branch(es)?)\b/oi"
\b means word boundary
? means match zero or one occurence
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.