<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Word distance algorithm to identify record of interest in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267277#M52806</link>
    <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;Based on help from sas community, I have tried&amp;nbsp;to identify records having specific words of interest in a particular order (e.g medicine will be first and diet will be second) as well as the distance between these words should be less than 7 words.&amp;nbsp;&lt;/P&gt;&lt;P&gt;data test;&lt;/P&gt;&lt;P&gt;xyz='She was prescribed exercise and diet. You may visit next week to take further advice about medicine as well as as well diet. You must take diet according to your dietician. Later we will think to revise your medicine'; *search for the word she;&lt;BR /&gt;First = 'medicine' ;&lt;BR /&gt;Second = 'diet' ;&lt;BR /&gt;array firsts (3) f1-f3;&lt;BR /&gt;Array seconds (3) s1-s3;&lt;BR /&gt;Findex=1;&lt;BR /&gt;Sindex=1;&lt;BR /&gt;do i = 1 to (countw(xyz));&lt;BR /&gt;if upcase(First) = upcase(Scan(xyz,i)) then do;&lt;BR /&gt;Firsts[Findex] = i;&lt;BR /&gt;Findex = Findex+1;&lt;BR /&gt;end;&lt;BR /&gt;if upcase(Second) = upcase(Scan(xyz,i)) then do;&lt;BR /&gt;Seconds[Sindex]=i;&lt;BR /&gt;Sindex = Sindex +1;&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;/P&gt;&lt;P&gt;do i = 1 to (n(of Firsts(*)));&lt;BR /&gt;put First "occurs in position" +1 Firsts[i] ;&lt;BR /&gt;end;&lt;BR /&gt;do j = 1 to (n(of seconds(*)));&lt;BR /&gt;put second "occurs in position" +1 seconds[j] ;&lt;BR /&gt;end;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;if (Firsts(1) lt seconds(1) and seconds(1) - firsts(1) le 6 and Firsts(1) ne . and seconds(1) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(1) lt seconds(2) and seconds(2) - firsts(1) le 6 and Firsts(1) ne . and seconds(2) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(1) lt seconds(3) and seconds(3) - firsts(1) le 6 and Firsts(1) ne . and seconds(3) ne . )&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(2) lt seconds(1) and seconds(1) - firsts(2) le 6 and Firsts(2) ne . and seconds(1) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(2) lt seconds(2) and seconds(2) - firsts(2) le 6 and Firsts(2) ne . and seconds(2) ne . )&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(2) lt seconds(3) and seconds(3) - firsts(2) le 6 and Firsts(2) ne . and seconds(3) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(3) lt seconds(1) and seconds(1) - firsts(3) le 6 and Firsts(3) ne . and seconds(1) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(3) lt seconds(2) and seconds(2) - firsts(3) le 6 and Firsts(3) ne . and seconds(2) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(3) lt seconds(3) and seconds(3) - firsts(3) le 6 and Firsts(3) ne . and seconds(3) ne . ) ;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can somebody suggest me some simplied code for the colored section.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advance for your kind reply.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Deepak&lt;/P&gt;</description>
    <pubDate>Fri, 29 Apr 2016 16:33:01 GMT</pubDate>
    <dc:creator>DeepakSwain</dc:creator>
    <dc:date>2016-04-29T16:33:01Z</dc:date>
    <item>
      <title>Word distance algorithm to identify record of interest</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267277#M52806</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;Based on help from sas community, I have tried&amp;nbsp;to identify records having specific words of interest in a particular order (e.g medicine will be first and diet will be second) as well as the distance between these words should be less than 7 words.&amp;nbsp;&lt;/P&gt;&lt;P&gt;data test;&lt;/P&gt;&lt;P&gt;xyz='She was prescribed exercise and diet. You may visit next week to take further advice about medicine as well as as well diet. You must take diet according to your dietician. Later we will think to revise your medicine'; *search for the word she;&lt;BR /&gt;First = 'medicine' ;&lt;BR /&gt;Second = 'diet' ;&lt;BR /&gt;array firsts (3) f1-f3;&lt;BR /&gt;Array seconds (3) s1-s3;&lt;BR /&gt;Findex=1;&lt;BR /&gt;Sindex=1;&lt;BR /&gt;do i = 1 to (countw(xyz));&lt;BR /&gt;if upcase(First) = upcase(Scan(xyz,i)) then do;&lt;BR /&gt;Firsts[Findex] = i;&lt;BR /&gt;Findex = Findex+1;&lt;BR /&gt;end;&lt;BR /&gt;if upcase(Second) = upcase(Scan(xyz,i)) then do;&lt;BR /&gt;Seconds[Sindex]=i;&lt;BR /&gt;Sindex = Sindex +1;&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;/P&gt;&lt;P&gt;do i = 1 to (n(of Firsts(*)));&lt;BR /&gt;put First "occurs in position" +1 Firsts[i] ;&lt;BR /&gt;end;&lt;BR /&gt;do j = 1 to (n(of seconds(*)));&lt;BR /&gt;put second "occurs in position" +1 seconds[j] ;&lt;BR /&gt;end;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;if (Firsts(1) lt seconds(1) and seconds(1) - firsts(1) le 6 and Firsts(1) ne . and seconds(1) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(1) lt seconds(2) and seconds(2) - firsts(1) le 6 and Firsts(1) ne . and seconds(2) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(1) lt seconds(3) and seconds(3) - firsts(1) le 6 and Firsts(1) ne . and seconds(3) ne . )&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(2) lt seconds(1) and seconds(1) - firsts(2) le 6 and Firsts(2) ne . and seconds(1) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(2) lt seconds(2) and seconds(2) - firsts(2) le 6 and Firsts(2) ne . and seconds(2) ne . )&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(2) lt seconds(3) and seconds(3) - firsts(2) le 6 and Firsts(2) ne . and seconds(3) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(3) lt seconds(1) and seconds(1) - firsts(3) le 6 and Firsts(3) ne . and seconds(1) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(3) lt seconds(2) and seconds(2) - firsts(3) le 6 and Firsts(3) ne . and seconds(2) ne . ) &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;or (Firsts(3) lt seconds(3) and seconds(3) - firsts(3) le 6 and Firsts(3) ne . and seconds(3) ne . ) ;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can somebody suggest me some simplied code for the colored section.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advance for your kind reply.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Deepak&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 16:33:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267277#M52806</guid>
      <dc:creator>DeepakSwain</dc:creator>
      <dc:date>2016-04-29T16:33:01Z</dc:date>
    </item>
    <item>
      <title>Re: Word distance algorithm to identify record of interest</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267294#M52814</link>
      <description>&lt;P&gt;Look at this example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_; 

xyz='She was prescribed exercise and diet. You may visit next week to take further advice about medicine as well as as well diet. You must take diet according to your dietician. Later we will think to revise your medicine'; 
   First = 'medicine' ;
   Second ='diet' ;
   array firsts (4)  f1-f4; /*assumes 1) that the first word won't occur more than 4 times*/
   Array seconds (4) s1-s4; 
   Findex=1;/* these index variables will point to where to store the word count in the arrays*/
   Sindex=1;
   do i = 1 to (countw(xyz));
      if upcase(First) = upcase(Scan(xyz,i)) then do;
         Firsts[Findex] = i;
         Findex = Findex+1;
      end;
      if upcase(Second) = upcase(Scan(xyz,i)) then do;
         Seconds[Sindex]=i;
         Sindex = Sindex +1;
      end;
   end;

   do i = 1 to (n(of Firsts(*)));
      do j = 1 to (n(of seconds(*)));
         if 0&amp;lt; seconds[j]- firsts[i] le 6 then 
            put First "occurs in position" +1 firsts[i] "and"+1 Second "occurs at position" +1 Seconds[j];
      end;
   end;

run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Also, it may be time to read a bit about arrays and logic constructs. To out put just the ones where "diet" is 6 or fewer words after "medicine" compare each value pair.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please post code in the box after selecting the "run" icon above. It will preserve formatting and the indents really make it much easier to read nested do loop code frequently needed for arrays.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could look up the n (of seconds(*)) to find out that it gets the count of populated cells in the array so all of your comments about "and ne ." are not needed.&lt;/P&gt;
&lt;P&gt;You DID need to add the Position of second - Position of first should be greater than 0.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 17:48:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267294#M52814</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-04-29T17:48:22Z</dc:date>
    </item>
    <item>
      <title>Re: Word distance algorithm to identify record of interest</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267300#M52817</link>
      <description>&lt;P&gt;Pattern matching is ideal for this kind of intricate request:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
xyz='She was prescribed exercise and diet. You may visit next week to take 
further advice about medicine as well as as well diet. You must take diet 
according to your dietician. Later we will think to revise your medicine';
run;

data _null_;
set test;
First = 'medicine' ;
do Second = "well", "diet", "you", "must", "take" ;
    interest = prxmatch(cats("/", First, "(\W+\w+){0,6}\W+", Second, "\b/i"), xyz);
    put (First Second interest) (=)/;
    end;
run;

/* 
 Pattern reads : Find First word, followed with 0 to 6 words (a sequence 
 of non-word characters (\W) followed by a sequence of word characters (\w)), 
 followed with a sequence of non-word characters, followed with the 
 Second word, ending on a word boundary (\b). 
 The match is case insensitive (i). 
*/
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Edit: new version + comments&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 30 Apr 2016 03:28:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267300#M52817</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-04-30T03:28:53Z</dc:date>
    </item>
    <item>
      <title>Re: Word distance algorithm to identify record of interest</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267334#M52834</link>
      <description>Hi there,&lt;BR /&gt;I am familiar with First. and Last. but the concept of firsts() and seconds() are new to me. In other word, I am new to sas. Can you kindly provide me some informative materials related to it to enrich my knowledge. Thank you in advance for your kind reply. Have a nice weekend.&lt;BR /&gt;Regards,&lt;BR /&gt;Deepak</description>
      <pubDate>Fri, 29 Apr 2016 20:14:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267334#M52834</guid>
      <dc:creator>DeepakSwain</dc:creator>
      <dc:date>2016-04-29T20:14:48Z</dc:date>
    </item>
    <item>
      <title>Re: Word distance algorithm to identify record of interest</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267335#M52835</link>
      <description>Hi there,&lt;BR /&gt;I am very much interested to explore the use of perl in this context. Unfortunately I am having little understanding of it being novice to sas. I tried to ran the gievn code but could not understand the importance of "think". I am looking for records where there will first word medicine being followed by second word diet and there may be intervening words less than 7.&lt;BR /&gt;Thank you in advance for your valuable input.&lt;BR /&gt;Regards,&lt;BR /&gt;Deepak</description>
      <pubDate>Fri, 29 Apr 2016 20:19:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267335#M52835</guid>
      <dc:creator>DeepakSwain</dc:creator>
      <dc:date>2016-04-29T20:19:37Z</dc:date>
    </item>
    <item>
      <title>Re: Word distance algorithm to identify record of interest</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267337#M52836</link>
      <description>&lt;P&gt;"think" is a word that is present in the string but at more than 7 words away from "medicine", thus the result interest=0.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The pattern means : &lt;EM&gt;Find the word medicine followed by zero to six (sequences of word letters followed one or many spaces) folowed by spaces and the word diet.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 20:28:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267337#M52836</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-04-29T20:28:13Z</dc:date>
    </item>
    <item>
      <title>Re: Word distance algorithm to identify record of interest</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267376#M52841</link>
      <description>&lt;P&gt;See edited version of my code.&lt;/P&gt;</description>
      <pubDate>Sat, 30 Apr 2016 03:27:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Word-distance-algorithm-to-identify-record-of-interest/m-p/267376#M52841</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-04-30T03:27:15Z</dc:date>
    </item>
  </channel>
</rss>

