Hello,
I'm looking to create a variable that contains the word directly before and directly after a keyword.
For example, if the keyword I would like to search for is "Apple" then
The apple is green and red ==> "the apple is"
I love to eat a lot of apples ==> "of apples"
I like orange juice ==> ""
And I would like to apply that to multiple keywords.
Thanks!
CF
How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.
data want;
input @1 text $40.;
prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
if prxmatch(prxExp, text) > 0 then do;
want_var = prxposn(prxExp, 0, text);
end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.
data want;
input @1 text $40.;
prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
if prxmatch(prxExp, text) > 0 then do;
want_var = prxposn(prxExp, 0, text);
end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
I have 5 keywords.
Thanks for your help!!
CF
Five is small enough that you could probably just add them to the example I gave you.
I'm sure there's a more elegant way, but if this meets your needs...
prxExp = prxparse('/(\w{0,})(\sapples?\s|\skeyword2\s|\skeyword3\s)(\w{0,})/i');
Can a phrase have more than 1 keyword? If so, what do you want to do?
I found a variation that works well - in the case that there's a period, comma or other non-text character, it also identifies it:
prxExp1 = prxparse('/(\w{0,})(\W{1,})(word1|word2|word3|word4|word5)(\W{1,})(\w{0,})/i');
if prxmatch(prxExp1, text) > 0 then do;
string = prxposn(prxExp1, 0, text)
I would go for:
data want;
if not prxId then
prxId + prxparse("/(\w+\s+)?\b(apples?|cherr(y|ies)|bananas?)\b(\s+\w+)?/io");
infile datalines truncover;
input text $100.;
length extract $100;
start = 1; stop = -1;
call prxnext(prxId, start, stop, text, pos, len);
if pos = 0 then output;
do while (pos > 0);
extract = substr(text, pos, len);
output;
call prxnext(prxId, start, stop, text, pos, len);
end;
keep text extract;
datalines;
The apple is green and red
I love to eat a lot of apples
The apple is green and red
I said "Apple".
I love to eat pineapples
APPLES, ORANGES, and BANANAS
I like orange juice
;
proc print; by text notsorted; id text; run;
data want; input text $40.; do i=1 to countw(text,' '); temp=scan(text,i,' '); if find(temp,'apple','i') then do; want=catx(' ',scan(text,i-1,' '),temp,scan(text,i+1,' ')); end; end; drop i temp; datalines; The apple is green and red I love to eat a lot of apples I like orange juice ;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.