Hello,
I'm looking to create a variable that contains the word directly before and directly after a keyword.
For example, if the keyword I would like to search for is "Apple" then
The apple is green and red ==> "the apple is"
I love to eat a lot of apples ==> "of apples"
I like orange juice ==> ""
And I would like to apply that to multiple keywords.
Thanks!
CF
How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.
data want;
input @1 text $40.;
prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
if prxmatch(prxExp, text) > 0 then do;
want_var = prxposn(prxExp, 0, text);
end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.
data want;
input @1 text $40.;
prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
if prxmatch(prxExp, text) > 0 then do;
want_var = prxposn(prxExp, 0, text);
end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
I have 5 keywords.
Thanks for your help!!
CF
Five is small enough that you could probably just add them to the example I gave you.
I'm sure there's a more elegant way, but if this meets your needs...
prxExp = prxparse('/(\w{0,})(\sapples?\s|\skeyword2\s|\skeyword3\s)(\w{0,})/i');
Can a phrase have more than 1 keyword? If so, what do you want to do?
I found a variation that works well - in the case that there's a period, comma or other non-text character, it also identifies it:
prxExp1 = prxparse('/(\w{0,})(\W{1,})(word1|word2|word3|word4|word5)(\W{1,})(\w{0,})/i');
if prxmatch(prxExp1, text) > 0 then do;
string = prxposn(prxExp1, 0, text)
I would go for:
data want;
if not prxId then
prxId + prxparse("/(\w+\s+)?\b(apples?|cherr(y|ies)|bananas?)\b(\s+\w+)?/io");
infile datalines truncover;
input text $100.;
length extract $100;
start = 1; stop = -1;
call prxnext(prxId, start, stop, text, pos, len);
if pos = 0 then output;
do while (pos > 0);
extract = substr(text, pos, len);
output;
call prxnext(prxId, start, stop, text, pos, len);
end;
keep text extract;
datalines;
The apple is green and red
I love to eat a lot of apples
The apple is green and red
I said "Apple".
I love to eat pineapples
APPLES, ORANGES, and BANANAS
I like orange juice
;
proc print; by text notsorted; id text; run;
data want; input text $40.; do i=1 to countw(text,' '); temp=scan(text,i,' '); if find(temp,'apple','i') then do; want=catx(' ',scan(text,i-1,' '),temp,scan(text,i+1,' ')); end; end; drop i temp; datalines; The apple is green and red I love to eat a lot of apples I like orange juice ;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.