BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Bhuvaneswari
Obsidian | Level 7

Hello,

 

I'm looking for help in extracting a sentence from a string (this is a charecter variable in sas), that has a specific word.

Say I'm looking for a word experience in a review and wanted to extract only the sentences that used the word 'experience' in it.

 

Input: Variable named comment has text "The worst experience I've ever had with them, especially after being with them for over a decade.I will not be recommending this gym and I will be contacting corporate as well as the Better Business Bureau." 

 

output: Variable named extract with sentence "The worst experience I've ever had with them, especially after being with them for over a decade" containing the word experience.

 

Kindly help.

 

Regards,

Bhuvana

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here's one approach.

 

data example;
   comment ="The worst experience I've ever had with them, especially after being with them for over a decade.I will not be recommending this gym and I will be contacting corporate as well as the Better Business Bureau." ;
   length extract sentence $200.;
   do i = 1 to countw(comment,'.');
      sentence = scan(comment,i,'.');
      if findw( sentence,'experience',' .,/','i')>0 then do;
         extract= catt(sentence,'.') ;
         output;
      end;
   end;
   drop sentence i;
run;

You do not specify what to do if the target word occurs in two or more sentences. The above loop would create a separate record for each sentence found.

 

If you think that that you have other "sentence" delimiters such as ; involved add them to the SCAN function.

The catt is to put the period back into the sentence that SCAN will remove.

View solution in original post

8 REPLIES 8
ballardw
Super User

Here's one approach.

 

data example;
   comment ="The worst experience I've ever had with them, especially after being with them for over a decade.I will not be recommending this gym and I will be contacting corporate as well as the Better Business Bureau." ;
   length extract sentence $200.;
   do i = 1 to countw(comment,'.');
      sentence = scan(comment,i,'.');
      if findw( sentence,'experience',' .,/','i')>0 then do;
         extract= catt(sentence,'.') ;
         output;
      end;
   end;
   drop sentence i;
run;

You do not specify what to do if the target word occurs in two or more sentences. The above loop would create a separate record for each sentence found.

 

If you think that that you have other "sentence" delimiters such as ; involved add them to the SCAN function.

The catt is to put the period back into the sentence that SCAN will remove.

Bhuvaneswari
Obsidian | Level 7

Thanks for quick help. Have a good weekend!

Bhuvaneswari
Obsidian | Level 7

I have another request, how can put all such sentences in one record say I want to append those sentences in the outputted variable extract intead of having one record per each sentence.

 

Thanks again!

ballardw
Super User

Here is modified example.

 

data example;
   comment ="The worst experience I've ever had with them. The worst service in a decade. Because of this experience I will not be recommending this gym." ;
   length extract sentence $200.;
   do i = 1 to countw(comment,'.');
      sentence = scan(comment,i,'.');
      if findw( sentence,'experience',' .,/','i')>0 then do;
         extract= catx(' ',extract,catt(sentence,'.')) ;

      end;
   end;
   drop sentence i;
run;

Note that I did change the original comment.

 

The Length assignment for Extract in this case should likely be the length of the comment variable in practice.

Bhuvaneswari
Obsidian | Level 7

I have not more than 10 occurrances in the string and I would like to have them all under one charecter variable instead of one per each occurance. 

 

say if the input variable comment has "The worst experience I've ever had with them, especially after being with them for over a decade.I will not be recommending this gym and I will be contacting corporate as well as the Better Business Bureau.
I hope my experience can be of help."

 

Expected output variable extract should have "he worst experience I've ever had with them, especially after being with them for over a decade.I hope my experience can be of help."

Thanks again!

Bhuvaneswari
Obsidian | Level 7

Thanks a lot! That helped!!

ilikesas
Barite | Level 11

Hi ballardw,

 

I experimented a bit witht he code and noticed that if I write something like "very bad experience!", SAS won't output it - but if I omit the exclamation mark from the word "experience" then SAS outputs it. So the code looks for the exact word "experience", but is it possible to make the code such that it will be searching for the presence of a string, even if it is a part of a bigger string?

 

Thank you!

ballardw
Super User

Delimiter choices, add or remove as needed. I just used periods for the SCAN part, you could add any character you want but make sure it is a "sentence", where sentence is whatever you want for a group of words, delimiter in your data. Note that this approach has a potential more complex problem. "My experience meant I would like to rate it 4.5 but the entry window would not allow that" would require addtional steps to identify if the . in 4.5 is actually a sentence end, or in "For a price of 53.67 the experience was too expensive and I won't return".  That sentence would start with "67 the experience ..." Freeform language is rife with other suitable issues. If you see a pattern like digit.digit that might fix some but what about the guy that does $.02 for "two cents worth"? Or an email address?

 

Any text you search for has rules. Pick the appropriate tool. Index, Find, Indexw, FindW, and sometimes prey. If the result quality needs to be high then often you get a human involved or a much better trained AI than I know how to access.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1599 views
  • 4 likes
  • 3 in conversation