Word Search Using Prxmatch

Reply
Occasional Contributor
Posts: 11

Word Search Using Prxmatch

I am using the prxmatch function to search for words in a variable.  However, I got results that are not perfect match.  In particular, I was trying to search for the word "tree", but I also got "street".  How do I get a perfect match, i.e., to exclude results that have words that include words that I am actually searching for?

 

Thanks.  

PROC Star
Posts: 307

Re: Word Search Using Prxmatch

It's hard to say without knowing exactly what you're searching for or what you're data might have, but for the example you provided something like prxmatch('/\btre\b/i', my_text) should fix the issue. The \b signifies a word boundary. If you just had a space, you would not find any "tree" values where "tree" starts the string.

Occasional Contributor
Posts: 11

Re: Word Search Using Prxmatch

Posted in reply to collinelliot

Thanks for the reply.  Below is the code that I had written and trying to find the cases that broken tree is involved:

 

data spct.tree;
set treedata;
if prxmatch("m/trees|limbs|branches/oi", combined_description) > 0 then tree=1;
else tree=0;
run;

 

However, like I said, the results include cases that show something like "a man walking on the street" because "tree" is part of the word street.  Since I have multiple words, as you can see, that I put in to search that I think is related to tree, how or where do I add the "\b" option so to take care of the problem?  Thanks and truly appreciate your help.  

PROC Star
Posts: 307

Re: Word Search Using Prxmatch

Again, this might not do everything you want depending on your data, but this is how you'd incorporate the word boundary into your code:

 

if prxmatch("m/\btrees\b|\blimbs\b|\bbranches\b/oi", combined_description) > 0 then tree=1;

 

Note that you can also get the 0, 1 boolean you want by using:

 

tree = prxmatch(....) > 0;

Respected Advisor
Posts: 4,925

Re: Word Search Using Prxmatch

To match singular and plural words, you could use "m/\b(trees?|limbs?|branch(es)?)\b/oi"

 

\b means word boundary

? means match zero or one occurence

PG
Occasional Contributor
Posts: 11

Re: Word Search Using Prxmatch

 
Ask a Question
Discussion stats
  • 5 replies
  • 185 views
  • 3 likes
  • 3 in conversation