Extract a sentence in a string that has a word in different forms

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Extract a sentence in a string that has a word in different forms

Hello,

 

Hope you all had a good weekend.

 

This question is similar to the one I posted last week but I ned help in making the code more robust.

Could you please help in extracting sentences that had a word in several forms?

 

Say all sentences with different forms of word 'experience'?  like experience, ExperiEnce-based, Experienced, Experiencing and so on? Can we get the desired results using regex coding?

 

Here is the link to the question

https://communities.sas.com/t5/General-SAS-Programming/Extra-a-sentence-in-a-string-that-has-a-speci...

 

Thanks a lot, appreciate the help!

 

Regards,

Bhuvana


Accepted Solutions
Solution
‎04-10-2017 11:12 AM
Super User
Posts: 5,079

Re: Extract a sentence in a string that has a word in different forms

It looks like you already have code to pick out one sentence at a time.  The easy way to make the code more robust would be to use the FIND function:

 

if find(sentence, 'experience', 'i') > 0 then ...

 

While you could use INDEX instead of FIND, the advantage of FIND is that you can add the "i" modifier to ignore differences in case.

View solution in original post


All Replies
Super User
Super User
Posts: 7,392

Re: Extract a sentence in a string that has a word in different forms

You can use fuzzy searching methods as described in this article:

http://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/

 

Solution
‎04-10-2017 11:12 AM
Super User
Posts: 5,079

Re: Extract a sentence in a string that has a word in different forms

It looks like you already have code to pick out one sentence at a time.  The easy way to make the code more robust would be to use the FIND function:

 

if find(sentence, 'experience', 'i') > 0 then ...

 

While you could use INDEX instead of FIND, the advantage of FIND is that you can add the "i" modifier to ignore differences in case.

Contributor
Posts: 22

Re: Extract a sentence in a string that has a word in different forms

Hello,

 

Thanks for the response, I used that but it is not picking sentences that either have hyphen like 'experince-based'.

For instance, the code below is not picking the first sentence 


data example;
comment ="The worst experience-based I've ever had with them. The worst service in a decade. Because of this experience I will not be recommending this gym." ;
length extract sentence $200.;
do i = 1 to countw(comment,'.');
sentence = scan(comment,i,'.');
if findw( sentence,'Experience',' .,/','i')>0 then do;
extract= catx(' ',extract,catt(sentence,'.')) ;

end;
end;
drop sentence i;
run;

 

Please advice on how to proceed further. The data I'm working on is not so well written hence I want the code to extract the words that are somewhere in the middle of the strings even if there is no space, say 'CauseExperience-Based' and so on.

 

Thanks a lot!

Super User
Posts: 5,079

Re: Extract a sentence in a string that has a word in different forms

Don't switch to FINDW.  Use FIND.  It will find the characters, not the word.

Contributor
Posts: 22

Re: Extract a sentence in a string that has a word in different forms

Switching to find() intead of findw() is returing a null string. Please advice.

Super User
Posts: 5,079

Re: Extract a sentence in a string that has a word in different forms

You'll have to show what you actually did here.  (For example, it might be that the third parameter to FIND must be removed, the way it appears in my original post.)

Contributor
Posts: 22

Re: Extract a sentence in a string that has a word in different forms

That worked! thanks a ton Smiley Happy. Can you please tell me how the find() could pick only the sentence that has the word in it without expliciply mentioning the delimiter? What if I want to change the delimiter to somthing other than period?

 

Thanks again for the assistance!

Super User
Posts: 5,079

Re: Extract a sentence in a string that has a word in different forms

With FIND, there are no delimiters.  It is looking for that sequence of characters anywhere in the sentence.  With a long word like "experience" that shouldn't cause any issues. 

Contributor
Posts: 22

Re: Extract a sentence in a string that has a word in different forms

Thanks for the help! Have a good day!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 174 views
  • 5 likes
  • 3 in conversation