BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

This is a sample of what the data looks like:

 

data have;
input text $50.;
cards;
sponsor withdrawn
withdrawn by sponsor
decision taken by sponsor to withdraw
sponsor decided to withdraw
;
run;

 

The key words are: decision/decide, sponsor, withdrawn

 

How to efficiently search for key words in anyorder they appear?

 

 

Note: Edited by Reeza for clarity and legibility.

5 REPLIES 5
Astounding
PROC Star

Search for them one at a time:

 

data want;

set have;

decide = find(longstring, 'decide', , 'i') > 0;

sponsor = find(longstring, 'sponsor', , 'i') > 0;

withdraw = find(longstring, 'withdraw', , 'i') > 0;

run;

 

This gives you three variables, each either 0 or 1, indicating whether the string was found.  The 'i' modifier says to ignore upper case vs. lower case differences.

 

It is possible to consider FINDW instead of FIND, but for your purposes it looks like FIND is better.  So the 0/1 value for the variable WITHDRAW indicates the presence of any of these strings:  withdraw, withdraws, withdrawn.  It does not locate "withdrew" however.  So create as many flags as are needed ... perhaps a separate one for "decision".

 

Reeza
Super User

It seems like you've been asking a few questions related to REGEX recently, so I thought this may be a useful reference:

https://regex101.com/

 

You can use it to build and test your strings. 

 

 

 

 

stat_sas
Ammonite | Level 13

Hi,

 

Below syntax would give the number of key words appearing in each observation. This can be modified to create 0/1 flag variables for each of the mentioned keyword.

 

data want(drop=list);
set have;
num_keywords=0;
length list $50;
do list = 'decision', 'decide','sponsor', 'withdrawn';
      if find(trim(text), trim(list),'i') > 0 then num_keywords+1;
end;
run;

SASPhile
Quartz | Level 8

the words in the list have to appear the way they are declared? what if the order changes?

ChrisNZ
Tourmaline | Level 20

1. What should the output look like?

2. RegEx bring no benefit for such a simple search. Consider using index() or find() as shown

  Fyi, the match string would be something like:


data HAVE;
input TEXT $50.;
MATCH=prxmatch('m/deci(sion|de)|sponsor|withdrawn/i',TEXT);
cards;
sponsor withdrawn
withdrawn by sponsor
decision taken by sponsor to withdraw
sponsor decided to withdraw
run;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 844 views
  • 0 likes
  • 5 in conversation