Hello,
I'm looking to create a variable that contains the word directly before and directly after a keyword.
For example, if the keyword I would like to search for is "Apple" then
The apple is green and red ==> "the apple is"
I love to eat a lot of apples ==> "of apples"
I like orange juice ==> ""
And I would like to apply that to multiple keywords.
Thanks!
CF
How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.
data want;
input @1 text $40.;
prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
if prxmatch(prxExp, text) > 0 then do;
want_var = prxposn(prxExp, 0, text);
end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.
data want;
input @1 text $40.;
prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
if prxmatch(prxExp, text) > 0 then do;
want_var = prxposn(prxExp, 0, text);
end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
I have 5 keywords.
Thanks for your help!!
CF
Five is small enough that you could probably just add them to the example I gave you.
I'm sure there's a more elegant way, but if this meets your needs...
prxExp = prxparse('/(\w{0,})(\sapples?\s|\skeyword2\s|\skeyword3\s)(\w{0,})/i');
Can a phrase have more than 1 keyword? If so, what do you want to do?
I found a variation that works well - in the case that there's a period, comma or other non-text character, it also identifies it:
prxExp1 = prxparse('/(\w{0,})(\W{1,})(word1|word2|word3|word4|word5)(\W{1,})(\w{0,})/i');
if prxmatch(prxExp1, text) > 0 then do;
string = prxposn(prxExp1, 0, text)
I would go for:
data want;
if not prxId then
prxId + prxparse("/(\w+\s+)?\b(apples?|cherr(y|ies)|bananas?)\b(\s+\w+)?/io");
infile datalines truncover;
input text $100.;
length extract $100;
start = 1; stop = -1;
call prxnext(prxId, start, stop, text, pos, len);
if pos = 0 then output;
do while (pos > 0);
extract = substr(text, pos, len);
output;
call prxnext(prxId, start, stop, text, pos, len);
end;
keep text extract;
datalines;
The apple is green and red
I love to eat a lot of apples
The apple is green and red
I said "Apple".
I love to eat pineapples
APPLES, ORANGES, and BANANAS
I like orange juice
;
proc print; by text notsorted; id text; run;
data want; input text $40.; do i=1 to countw(text,' '); temp=scan(text,i,' '); if find(temp,'apple','i') then do; want=catx(' ',scan(text,i-1,' '),temp,scan(text,i+1,' ')); end; end; drop i temp; datalines; The apple is green and red I love to eat a lot of apples I like orange juice ;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.