Hello,
I'm looking to create a variable that contains the word directly before and directly after a keyword.
For example, if the keyword I would like to search for is "Apple" then
The apple is green and red ==> "the apple is"
I love to eat a lot of apples ==> "of apples"
I like orange juice ==> ""
And I would like to apply that to multiple keywords.
Thanks!
CF
How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.
data want;
input @1 text $40.;
prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
if prxmatch(prxExp, text) > 0 then do;
want_var = prxposn(prxExp, 0, text);
end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.
data want;
input @1 text $40.;
prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
if prxmatch(prxExp, text) > 0 then do;
want_var = prxposn(prxExp, 0, text);
end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
I have 5 keywords.
Thanks for your help!!
CF
Five is small enough that you could probably just add them to the example I gave you.
I'm sure there's a more elegant way, but if this meets your needs...
prxExp = prxparse('/(\w{0,})(\sapples?\s|\skeyword2\s|\skeyword3\s)(\w{0,})/i');
Can a phrase have more than 1 keyword? If so, what do you want to do?
I found a variation that works well - in the case that there's a period, comma or other non-text character, it also identifies it:
prxExp1 = prxparse('/(\w{0,})(\W{1,})(word1|word2|word3|word4|word5)(\W{1,})(\w{0,})/i');
if prxmatch(prxExp1, text) > 0 then do;
string = prxposn(prxExp1, 0, text)
I would go for:
data want;
if not prxId then
prxId + prxparse("/(\w+\s+)?\b(apples?|cherr(y|ies)|bananas?)\b(\s+\w+)?/io");
infile datalines truncover;
input text $100.;
length extract $100;
start = 1; stop = -1;
call prxnext(prxId, start, stop, text, pos, len);
if pos = 0 then output;
do while (pos > 0);
extract = substr(text, pos, len);
output;
call prxnext(prxId, start, stop, text, pos, len);
end;
keep text extract;
datalines;
The apple is green and red
I love to eat a lot of apples
The apple is green and red
I said "Apple".
I love to eat pineapples
APPLES, ORANGES, and BANANAS
I like orange juice
;
proc print; by text notsorted; id text; run;
data want; input text $40.; do i=1 to countw(text,' '); temp=scan(text,i,' '); if find(temp,'apple','i') then do; want=catx(' ',scan(text,i-1,' '),temp,scan(text,i+1,' ')); end; end; drop i temp; datalines; The apple is green and red I love to eat a lot of apples I like orange juice ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.