DATA Step, Macro, Functions and more

Regex

Reply
Super Contributor
Posts: 717

Regex

[ Edited ]

This is a sample of what the data looks like:

 

data have;
input text $50.;
cards;
sponsor withdrawn
withdrawn by sponsor
decision taken by sponsor to withdraw
sponsor decided to withdraw
;
run;

 

The key words are: decision/decide, sponsor, withdrawn

 

How to efficiently search for key words in anyorder they appear?

 

 

Note: Edited by Reeza for clarity and legibility.

Super User
Posts: 6,908

Re: Regex

Search for them one at a time:

 

data want;

set have;

decide = find(longstring, 'decide', , 'i') > 0;

sponsor = find(longstring, 'sponsor', , 'i') > 0;

withdraw = find(longstring, 'withdraw', , 'i') > 0;

run;

 

This gives you three variables, each either 0 or 1, indicating whether the string was found.  The 'i' modifier says to ignore upper case vs. lower case differences.

 

It is possible to consider FINDW instead of FIND, but for your purposes it looks like FIND is better.  So the 0/1 value for the variable WITHDRAW indicates the presence of any of these strings:  withdraw, withdraws, withdrawn.  It does not locate "withdrew" however.  So create as many flags as are needed ... perhaps a separate one for "decision".

 

Super User
Posts: 23,963

Re: Regex

It seems like you've been asking a few questions related to REGEX recently, so I thought this may be a useful reference:

https://regex101.com/

 

You can use it to build and test your strings. 

 

 

 

 

Trusted Advisor
Posts: 1,270

Re: Regex

Hi,

 

Below syntax would give the number of key words appearing in each observation. This can be modified to create 0/1 flag variables for each of the mentioned keyword.

 

data want(drop=list);
set have;
num_keywords=0;
length list $50;
do list = 'decision', 'decide','sponsor', 'withdrawn';
      if find(trim(text), trim(list),'i') > 0 then num_keywords+1;
end;
run;

Super Contributor
Posts: 717

Re: Regex

the words in the list have to appear the way they are declared? what if the order changes?

Super User
Posts: 2,499

Re: Regex

1. What should the output look like?

2. RegEx bring no benefit for such a simple search. Consider using index() or find() as shown

  Fyi, the match string would be something like:


data HAVE;
input TEXT $50.;
MATCH=prxmatch('m/deci(sion|de)|sponsor|withdrawn/i',TEXT);
cards;
sponsor withdrawn
withdrawn by sponsor
decision taken by sponsor to withdraw
sponsor decided to withdraw
run;

 

Ask a Question
Discussion stats
  • 5 replies
  • 131 views
  • 0 likes
  • 5 in conversation