This is a sample of what the data looks like:
data have;
input text $50.;
cards;
sponsor withdrawn
withdrawn by sponsor
decision taken by sponsor to withdraw
sponsor decided to withdraw
;
run;
The key words are: decision/decide, sponsor, withdrawn
How to efficiently search for key words in anyorder they appear?
Note: Edited by Reeza for clarity and legibility.
Search for them one at a time:
data want;
set have;
decide = find(longstring, 'decide', , 'i') > 0;
sponsor = find(longstring, 'sponsor', , 'i') > 0;
withdraw = find(longstring, 'withdraw', , 'i') > 0;
run;
This gives you three variables, each either 0 or 1, indicating whether the string was found. The 'i' modifier says to ignore upper case vs. lower case differences.
It is possible to consider FINDW instead of FIND, but for your purposes it looks like FIND is better. So the 0/1 value for the variable WITHDRAW indicates the presence of any of these strings: withdraw, withdraws, withdrawn. It does not locate "withdrew" however. So create as many flags as are needed ... perhaps a separate one for "decision".
It seems like you've been asking a few questions related to REGEX recently, so I thought this may be a useful reference:
You can use it to build and test your strings.
Hi,
Below syntax would give the number of key words appearing in each observation. This can be modified to create 0/1 flag variables for each of the mentioned keyword.
data want(drop=list);
set have;
num_keywords=0;
length list $50;
do list = 'decision', 'decide','sponsor', 'withdrawn';
if find(trim(text), trim(list),'i') > 0 then num_keywords+1;
end;
run;
the words in the list have to appear the way they are declared? what if the order changes?
1. What should the output look like?
2. RegEx bring no benefit for such a simple search. Consider using index() or find() as shown
Fyi, the match string would be something like:
data HAVE;
input TEXT $50.;
MATCH=prxmatch('m/deci(sion|de)|sponsor|withdrawn/i',TEXT);
cards;
sponsor withdrawn
withdrawn by sponsor
decision taken by sponsor to withdraw
sponsor decided to withdraw
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.