I have long strings such as "James E Gooding will step down as Chairman and remain as Financial Expert of the Audit Committee" and "Sushi Danooo Kronick steps down as Chairman but remains as Member of the Investment and Finance Committee".
I want to keep all the words before the first word "and" or "but".
So, the results I want are "James E Gooding will step down as Chairman" and "Sushi Danooo Kronick steps down as Chairman".
I have tried scan and substr, but for scan, it seems there is a limit for extracting words. Only parts of the strings were copied. For substr, it does not work as the log said the word types were changed.
Thank you so much.
You want FINDW() to look for a WORD.
data want;
set have;
loc1=findw(string,'and','i');
loc2=findw(string,'but','i');
if loc1 and loc2 then loc=min(loc1,loc2);
else if loc1 then loc=loc1;
else if loc2 then loc=loc2;
if loc then want=substrn(string,1,loc-1);
run;
I wouldn't be surprised if I'm overlooking a more concise solution.
data have;
var = 'James E Gooding will step down as Chairman and remain as Financial Expert of the Audit Committee'; output;
var = 'Sushi Danooo Kronick steps down as Chairman but remains as Member of the Investment and Finance Committee'; output;
;
run;
data want;
set have;
want = trim(substr(var, 1, prxmatch('/and|but/', var) - 1));
run;
@Jarvin99 wrote:
Hi, may I know why substr(description,1,find(description,'and','i')-1) does not work?
FIND will find the string even when embedded in a word:
data example; string ="brand name information and other information"; x = find(string,'and'); run;
Which returns 3 for the found position in the middle of "brand".
Sometimes you get away with it depending on the data but a relatively common string like "and" is in many words.
FINDW will restrict to things on "word" boundaries, which can be specified by additional parameters.
Hint: when you say "does not work" it often is a good idea to provide a working example and we can indicate what was found instead of what you expected.
You want FINDW() to look for a WORD.
data want;
set have;
loc1=findw(string,'and','i');
loc2=findw(string,'but','i');
if loc1 and loc2 then loc=min(loc1,loc2);
else if loc1 then loc=loc1;
else if loc2 then loc=loc2;
if loc then want=substrn(string,1,loc-1);
run;
Hi,
Sorry I have a question again. I tried to add one more delimiter "to become".
data test;
set have;
loc1=findw(description,'and');
loc2=findw(description,'but');
loc3=findw(description,'to become');
if loc1 and loc2 then loc=min(loc1, loc2);
if loc1 and loc3 then loc=min(loc1, loc3);
if loc2 and loc3 then loc=min(loc2, loc3);
else if loc1 then loc=loc1;
else if loc2 then loc=loc2;
else if loc3 then loc=loc3;
if loc then description_1=substrn(description,1,loc-1);
run;
But, it does not work anymore. loc dose not return me the smallest number possible when comparing loc1 and loc2. What should I do? Thank you.
@Jarvin99 wrote:
Hi,
Sorry I have a question again. I tried to add one more delimiter "to become".
data test;
set have;
loc1=findw(description,'and');
loc2=findw(description,'but');
loc3=findw(description,'to become');
if loc1 and loc2 then loc=min(loc1, loc2);
if loc1 and loc3 then loc=min(loc1, loc3);
if loc2 and loc3 then loc=min(loc2, loc3);
else if loc1 then loc=loc1;
else if loc2 then loc=loc2;
else if loc3 then loc=loc3;
if loc then description_1=substrn(description,1,loc-1);
run;
But, it does not work anymore. loc dose not return me the smallest number possible when comparing loc1 and loc2. What should I do? Thank you.
Your IF conditions are wrong when there are three instead of two search terms.
Perhaps a different algorithm would be easier to extend when there are more than two search terms.
data test;
set have;
loc=.;
loc_next=findw(description,'and');
if loc_next then loc=min(loc,loc_next);
loc_next=findw(description,'but');
if loc_next then loc=min(loc,loc_next);
loc_next=findw(description,'to become');
if loc_next then loc=min(loc,loc_next);
if loc>1 then description_1=substr(description,1,loc-1);
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.