09-20-2017 01:21 PM - last edited on 09-20-2017 01:31 PM by Reeza
This is a sample of what the data looks like:
data have; input text $50.; cards; sponsor withdrawn withdrawn by sponsor decision taken by sponsor to withdraw sponsor decided to withdraw ; run;
The key words are: decision/decide, sponsor, withdrawn
How to efficiently search for key words in anyorder they appear?
Note: Edited by Reeza for clarity and legibility.
09-20-2017 01:34 PM
Search for them one at a time:
decide = find(longstring, 'decide', , 'i') > 0;
sponsor = find(longstring, 'sponsor', , 'i') > 0;
withdraw = find(longstring, 'withdraw', , 'i') > 0;
This gives you three variables, each either 0 or 1, indicating whether the string was found. The 'i' modifier says to ignore upper case vs. lower case differences.
It is possible to consider FINDW instead of FIND, but for your purposes it looks like FIND is better. So the 0/1 value for the variable WITHDRAW indicates the presence of any of these strings: withdraw, withdraws, withdrawn. It does not locate "withdrew" however. So create as many flags as are needed ... perhaps a separate one for "decision".
09-20-2017 01:39 PM
It seems like you've been asking a few questions related to REGEX recently, so I thought this may be a useful reference:
You can use it to build and test your strings.
09-20-2017 02:42 PM
Below syntax would give the number of key words appearing in each observation. This can be modified to create 0/1 flag variables for each of the mentioned keyword.
length list $50;
do list = 'decision', 'decide','sponsor', 'withdrawn';
if find(trim(text), trim(list),'i') > 0 then num_keywords+1;
09-20-2017 09:18 PM
1. What should the output look like?
2. RegEx bring no benefit for such a simple search. Consider using index() or find() as shown
Fyi, the match string would be something like:
data HAVE; input TEXT $50.; MATCH=prxmatch('m/deci(sion|de)|sponsor|withdrawn/i',TEXT); cards; sponsor withdrawn withdrawn by sponsor decision taken by sponsor to withdraw sponsor decided to withdraw run;