BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jarvin99
Obsidian | Level 7

I have long strings such as "James E Gooding will step down as Chairman and remain as Financial Expert of the Audit Committee" and "Sushi Danooo Kronick steps down as Chairman but remains as Member of the Investment and Finance Committee".

I want to keep all the words before the first word "and" or "but".

So, the results I want are "James E Gooding will step down as Chairman" and "Sushi Danooo Kronick steps down as Chairman".

I have tried scan and substr, but for scan, it seems there is a limit for extracting words. Only parts of the strings were copied. For substr,  it does not work as the log said the word types were changed.

 

Thank you so much.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You want FINDW() to look for a WORD.

data want;
  set have;
  loc1=findw(string,'and','i');
  loc2=findw(string,'but','i');
  if loc1 and loc2 then loc=min(loc1,loc2);
  else if loc1 then loc=loc1;
  else if loc2 then loc=loc2;
  if loc then want=substrn(string,1,loc-1);
run;

View solution in original post

7 REPLIES 7
maguiremq
SAS Super FREQ

I wouldn't be surprised if I'm overlooking a more concise solution.

 

data have;
var = 'James E Gooding will step down as Chairman and remain as Financial Expert of the Audit Committee'; output;
var = 'Sushi Danooo Kronick steps down as Chairman but remains as Member of the Investment and Finance Committee'; output;
;
run;

data want;
	set have;
	want = trim(substr(var, 1, prxmatch('/and|but/', var) - 1));
run;

maguiremq_0-1650290578482.png

Jarvin99
Obsidian | Level 7
Hi, may I know why substr(description,1,find(description,'and','i')-1) does not work? What is the difference between find and prxmatch? Thank you so much.
ballardw
Super User

@Jarvin99 wrote:
Hi, may I know why substr(description,1,find(description,'and','i')-1) does not work?

FIND will find the string even when embedded in a  word:

data example; 
   string ="brand name information and other information";
   x = find(string,'and');
run;

Which returns 3 for the found position in the middle of "brand".

Sometimes you get away with it depending on the data but a relatively common string like "and" is in many words.

FINDW will restrict to things on "word" boundaries, which can be specified by additional parameters.

 

Hint: when you say "does not work" it often is a good idea to provide a working example and we can indicate what was found instead of what you expected.

Tom
Super User Tom
Super User

You want FINDW() to look for a WORD.

data want;
  set have;
  loc1=findw(string,'and','i');
  loc2=findw(string,'but','i');
  if loc1 and loc2 then loc=min(loc1,loc2);
  else if loc1 then loc=loc1;
  else if loc2 then loc=loc2;
  if loc then want=substrn(string,1,loc-1);
run;
Jarvin99
Obsidian | Level 7
I see. Thank you for pointing out my mistake.
Jarvin99
Obsidian | Level 7

Hi,

Sorry I have a question again. I tried to add one more delimiter "to become".


data test;
set have;
 loc1=findw(description,'and');
 loc2=findw(description,'but');
loc3=findw(description,'to become');
if loc1 and loc2 then loc=min(loc1, loc2);
 if loc1 and loc3 then loc=min(loc1, loc3);
if loc2 and loc3 then loc=min(loc2, loc3);
else if loc1 then loc=loc1;
else if loc2 then loc=loc2;
 else if loc3 then loc=loc3;
if loc then description_1=substrn(description,1,loc-1);
run;

 

But, it does not work anymore. loc dose not return me the smallest number possible when comparing loc1 and loc2. What should I do? Thank you.

Tom
Super User Tom
Super User

@Jarvin99 wrote:

Hi,

Sorry I have a question again. I tried to add one more delimiter "to become".


data test;
set have;
 loc1=findw(description,'and');
 loc2=findw(description,'but');
loc3=findw(description,'to become');
if loc1 and loc2 then loc=min(loc1, loc2);
 if loc1 and loc3 then loc=min(loc1, loc3);
if loc2 and loc3 then loc=min(loc2, loc3);
else if loc1 then loc=loc1;
else if loc2 then loc=loc2;
 else if loc3 then loc=loc3;
if loc then description_1=substrn(description,1,loc-1);
run;

 

But, it does not work anymore. loc dose not return me the smallest number possible when comparing loc1 and loc2. What should I do? Thank you.


Your IF conditions are wrong when there are three instead of two search terms.

Perhaps a different algorithm would be easier to extend when there are more than two search terms.

data test;
  set have;
  loc=.;

  loc_next=findw(description,'and');
  if loc_next then loc=min(loc,loc_next);

  loc_next=findw(description,'but');
  if loc_next then loc=min(loc,loc_next);

  loc_next=findw(description,'to become');
  if loc_next then loc=min(loc,loc_next);

  if loc>1 then description_1=substr(description,1,loc-1);
run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 813 views
  • 1 like
  • 4 in conversation