Looking for Words within a variable that is text notes

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Looking for Words within a variable that is text notes

Hello,

I am looking for a way to pull observations that have certain key words within the notes variable.  So for instance, the notes might be "Customer called in to say he hired an Attorney"

I want to look through the text variable and if it has Attorney or lawsuit, or a number of other words I want to pull that observation and the key words in that variable field.  So I would get account # 4512  and notes = attorney.

I tried using the indexc function, but that only looks at the first character of the keywords list and brings back stuff that I don't need.

X = indexc (notes, "lawsuit", "court", "lies" "illegal");

brieftnotes = substr(notes,X,17);

So if the notes contain "laughing at his comment" this would come back as a hit because it starts with an L even though I am only looking for lawsuit.

Thanks for your help.


Accepted Solutions
Solution
‎08-18-2015 07:48 AM
Respected Advisor
Posts: 3,799

Re: Looking for Words within a variable that is text notes

You might want to consider a regular expression.

prxparse('/\blawsuits?\b|\battorney\b/i');

This looks for WORDS (\b) 'lawsuit' or 'lawsuits' or 'attorney' ignoring case.  RTM for details on how to use a regex.

View solution in original post


All Replies
Contributor dkb
Contributor
Posts: 53

Re: Looking for Words within a variable that is text notes

Have a look at the documentation for INDEXW:

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

Super User
Super User
Posts: 7,942

Re: Looking for Words within a variable that is text notes

Its generally a good idea to post some test data in the form of a datastep, with required output.  As you have a list of words I would first put these in a dataset:

data words;

  word="attorney"; output;

  word="lawsuit"; output;

run;

data test;

  length text $200;

  key=1; text="abcd whwysj iejroreo"; output;

  key=1; text="Lawsuit dfaf afa afafa"; output;

  key=2; text="asjhafh dahs ATTORNEY"; output;

run;

data _null_;

  set words end=last;

  if _n_=1 then call execute('data want; set test; ');

  call execute(cats(' if findw(upcase(text),"',upcase(word),'") > 0 then output;'));

  if last then call execute('run;');

run;

    

This may seem a bit overkill for two words, but when you get a few more words it does simplify things.

Regular Contributor
Posts: 216

Re: Looking for Words within a variable that is text notes

Here is another solution,

options mprint; /* Provides Macro Debugging statements */

/* Declare a macro to construct dynamic Data Step code statements based on passed parameters */

%macro wordsSearch (

  p_words=          /* Space delimited list of words to search for */

, p_varName=      /* Data step variable name to be searched */

);

    %local l_i l_wCount l_word;

    %let l_wCount = %sysfunc(countc(%superq(p_words),%str( )));

    %str(if %()

    %do l_i=1 %to &l_wCount;

        %let l_word = %upcase(%scan(%superq(p_words),&l_i,%str( )));

        %str(%(FINDW(UPCASE(&p_varName),"&l_word")%) OR)

    %end;

    %let l_word = %upcase(%scan(%superq(p_words),&l_i,%str( )));

    %str(%(FINDW(UPCASE(&p_varName),"&l_word")%))

    %str(%) then outputSmiley Wink

%mend wordsSearch;

/* Usage Example */

data test;

  length text $200;

  key=1; text="abcd whwysj iejroreo"; output;

  key=1; text="Lawsuit dfaf afa afafa"; output;

  key=2; text="asjhafh dahs ATTORNEY"; output;

run;

data want;

    set test;

    %wordsSearch(p_words=%str(attorney lawsuit), p_varName=text);

run;

Solution
‎08-18-2015 07:48 AM
Respected Advisor
Posts: 3,799

Re: Looking for Words within a variable that is text notes

You might want to consider a regular expression.

prxparse('/\blawsuits?\b|\battorney\b/i');

This looks for WORDS (\b) 'lawsuit' or 'lawsuits' or 'attorney' ignoring case.  RTM for details on how to use a regex.

Occasional Contributor
Posts: 19

Re: Looking for Words within a variable that is text notes

Posted in reply to data_null__

data_null_,

Please let me know more about your solution.

Respected Advisor
Posts: 3,799

Re: Looking for Words within a variable that is text notes

The REGEX is a list of words to search for separated by or (|) the inclusion of \b before and after restricts the search to words (you may not want that) the ? following the S in lawsuits makes it optional, so the search find lawsuits or lawsuits.

data test;
  input text $80.;
 
cards;
abcd lawsuitswhwysj iejroreo
Lawsuit dfaf afa afafa
asjhafh dahs ATTORNEY
laughing at his comment
laughing at his lies
run;

data test2;
   if _n_ eq 1 then rx=prxparse('/\blawsuits?\b|\battorney\b|\bcourt\b|\blies\b|\billegal\b/i');
   retain rx;
   set test;
   match = prxmatch(rx,text);
  
run;
proc print;
  
run;
Capture.PNG
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 319 views
  • 6 likes
  • 5 in conversation