DATA Step, Macro, Functions and more

Return the word after and before a keyword

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

Return the word after and before a keyword

Hello,

 

I'm looking to create a variable that contains the word directly before and directly after a keyword.

For example, if the keyword I would like to search for is "Apple" then

The apple is green and red ==> "the apple is"

I love to eat a lot of apples ==> "of apples"

I like orange juice ==>  ""

 

And I would like to apply that to multiple keywords.

 

Thanks!

 

CF


Accepted Solutions
Solution
‎10-30-2017 07:04 AM
PROC Star
Posts: 311

Re: Return the word after and before a keyword

Posted in reply to camfarrell25

How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.

 


data want;
    input @1 text $40.;

    prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
    if prxmatch(prxExp, text) > 0 then do;
        want_var = prxposn(prxExp, 0, text);
    end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;

View solution in original post


All Replies
Solution
‎10-30-2017 07:04 AM
PROC Star
Posts: 311

Re: Return the word after and before a keyword

Posted in reply to camfarrell25

How many keywords are you talking about? The below works for the example you provided, but it will certainly have to be modified to address all of your data.

 


data want;
    input @1 text $40.;

    prxExp = prxparse('/(\w{0,})(\sapples?\s)(\w{0,})/i');
    if prxmatch(prxExp, text) > 0 then do;
        want_var = prxposn(prxExp, 0, text);
    end;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;
Contributor
Posts: 43

Re: Return the word after and before a keyword

Posted in reply to collinelliot

I have 5 keywords.

 

Thanks for your help!!
CF

PROC Star
Posts: 311

Re: Return the word after and before a keyword

Posted in reply to camfarrell25

Five is small enough that you could probably just add them to the example I gave you. 

PROC Star
Posts: 311

Re: Return the word after and before a keyword

Posted in reply to camfarrell25

 

I'm sure there's a more elegant way, but if this meets your needs...

 

prxExp = prxparse('/(\w{0,})(\sapples?\s|\skeyword2\s|\skeyword3\s)(\w{0,})/i');

 

Trusted Advisor
Posts: 1,288

Re: Return the word after and before a keyword

Posted in reply to camfarrell25

Can a phrase have more than 1 keyword?  If so, what do you want to do?

Contributor
Posts: 43

Re: Return the word after and before a keyword

Posted in reply to collinelliot

I found a variation that works well - in the case that there's a period, comma or other non-text character, it also identifies it: 

 

prxExp1 = prxparse('/(\w{0,})(\W{1,})(word1|word2|word3|word4|word5)(\W{1,})(\w{0,})/i');
if prxmatch(prxExp1, text) > 0 then do;
string = prxposn(prxExp1, 0, text)

Esteemed Advisor
Posts: 5,399

Re: Return the word after and before a keyword

Posted in reply to camfarrell25

I would go for:

 

data want;
if not prxId then 
    prxId + prxparse("/(\w+\s+)?\b(apples?|cherr(y|ies)|bananas?)\b(\s+\w+)?/io");

infile datalines truncover;
input text $100.;
length extract $100;

start = 1; stop = -1;
call prxnext(prxId, start, stop, text, pos, len);
if pos = 0 then output;
do while (pos > 0);
    extract = substr(text, pos, len);
    output;
    call prxnext(prxId, start, stop, text, pos, len);
    end;
keep text extract;
datalines;
The apple is green and red
I love to eat a lot of apples
The  apple is green and red
I said "Apple".
I love to eat pineapples
APPLES, ORANGES, and BANANAS
I like orange juice
;

proc print; by text notsorted; id text; run;
PG
Super User
Posts: 10,615

Re: Return the word after and before a keyword

Posted in reply to camfarrell25

data want;
input text $40.;
do i=1 to countw(text,' ');
 temp=scan(text,i,' '); 
 if find(temp,'apple','i') then do;
   want=catx(' ',scan(text,i-1,' '),temp,scan(text,i+1,' '));
 end;
end;
drop i temp;
datalines;
The apple is green and red
I love to eat a lot of apples
I like orange juice
;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 260 views
  • 4 likes
  • 5 in conversation