<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extract words before and after regex pattern from text in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616728#M180598</link>
    <description>&lt;P&gt;Can you show an example of what you want to do?&lt;/P&gt;</description>
    <pubDate>Sat, 11 Jan 2020 22:03:16 GMT</pubDate>
    <dc:creator>PeterClemmensen</dc:creator>
    <dc:date>2020-01-11T22:03:16Z</dc:date>
    <item>
      <title>Extract words before and after regex pattern from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616724#M180595</link>
      <description>&lt;P&gt;Hi, I'm extracting information from a string based on keywords that are pre-defined by regex patterns. My question is how do I get 4 words before and after the keywords, and save them into two separate columns? BIG thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let's say the regex pattern is defined as:&amp;nbsp;&lt;/P&gt;&lt;P&gt;patternID = prxparse(&lt;SPAN&gt;'/a \w+ fruit/i'&lt;/SPAN&gt;);&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 11 Jan 2020 21:16:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616724#M180595</guid>
      <dc:creator>inyli</dc:creator>
      <dc:date>2020-01-11T21:16:36Z</dc:date>
    </item>
    <item>
      <title>Re: Extract words before and after regex pattern from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616728#M180598</link>
      <description>&lt;P&gt;Can you show an example of what you want to do?&lt;/P&gt;</description>
      <pubDate>Sat, 11 Jan 2020 22:03:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616728#M180598</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2020-01-11T22:03:16Z</dc:date>
    </item>
    <item>
      <title>Re: Extract words before and after regex pattern from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616731#M180601</link>
      <description>&lt;P&gt;I'm not sure how to do it with regular expressions, but it should be fairly easy to do with the scan function. e.g.:&lt;CODE class=" language-sas"&gt;
&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop=i j);
  length string before after $200;
  input string &amp;amp;;
  do i=1 to countw(string," ,");
    if scan(string,i," ,",'i')="fruit" then do;
      do j=max(1,i-4) to i-1;
        before=catx(' ',before,scan(string,j," ,",'i'));
      end;
      do j=i+1 to min(i+4,countw(string," ,",'i'));
        after=catx(' ',after,scan(string,j," ,",'i'));
      end;
      leave;
    end;
  end;
  cards;
word1 word2 word3 word4 word5 word6 fruit word7 word8 word9 word10 word11
word1 word2's fruit word3 word4 word5 word6 word7 word8 word9 word10 word11
word1 word2 fruit word3 word4 word5 word6 word7 word8 word9 word10 word11
Some pretend that a tomato is a real fruit, others say it's a vegetable
Some say that a tomato is a fruit, I'd say it is a vegetable
Mr Afruiting told us not to eat fruit like apples, pears and oranges
;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note: Changed original post to include improved code&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jan 2020 22:12:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616731#M180601</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2020-01-13T22:12:23Z</dc:date>
    </item>
    <item>
      <title>Re: Extract words before and after regex pattern from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616756#M180618</link>
      <description>&lt;P&gt;Use prxPosn to extract sub buffers:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
length line before after $100;
input line &amp;amp;;
prxId = prxparse("/((\w+\W+){0,4})(a \w+ fruit)\W+((\w+\W+){0,4})/i");
if prxmatch(prxId, line) then do;
	before = prxposn(prxId, 1, line);
	after = prxposn(prxId, 4, line);
	end;
drop prxId;
datalines;
Some pretend that a tomato is a real fruit, others say it's a vegetable
;

proc print data=test noobs; run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;line 	before 	after
Some pretend that a tomato is a real fruit, others say it's a vegetable 	that a tomato is 	others say it's&lt;/PRE&gt;</description>
      <pubDate>Sun, 12 Jan 2020 05:29:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616756#M180618</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2020-01-12T05:29:19Z</dc:date>
    </item>
    <item>
      <title>Re: Extract words before and after regex pattern from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616802#M180643</link>
      <description>&lt;P&gt;THANK YOU!!!&lt;/P&gt;</description>
      <pubDate>Sun, 12 Jan 2020 17:23:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616802#M180643</guid>
      <dc:creator>inyli</dc:creator>
      <dc:date>2020-01-12T17:23:46Z</dc:date>
    </item>
    <item>
      <title>Re: Extract words before and after regex pattern from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616807#M180644</link>
      <description>&lt;P&gt;While I like the regular expression approach, the suggested expression doesn't do what you want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I revised my suggested code to account for the test string that&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;&amp;nbsp; posted, as well as a couple of more variants.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suggest that someone offers whatever revision(s) are needed to&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;'s suggested code that would enable it to produce the same results as the following code and examples:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop=i j);
  length string before after $200;
  input string &amp;amp;;
  do i=1 to countw(string," ,");
    if scan(string,i," ,",'i')="fruit" then do;
      do j=max(1,i-4) to i-1;
        before=catx(' ',before,scan(string,j," ,",'i'));
      end;
      do j=i+1 to min(i+4,countw(string," ,",'i'));
        after=catx(' ',after,scan(string,j," ,",'i'));
      end;
      leave;
    end;
  end;
  cards;
word1 word2 word3 word4 word5 word6 fruit word7 word8 word9 word10 word11
word1 word2's fruit word3 word4 word5 word6 word7 word8 word9 word10 word11
word1 word2 fruit word3 word4 word5 word6 word7 word8 word9 word10 word11
Some pretend that a tomato is a real fruit, others say it's a vegetable
Some say that a tomato is a fruit, I'd say it is a vegetable
Mr Afruiting told us not to eat fruit like apples, pears and oranges
;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jan 2020 22:09:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/616807#M180644</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2020-01-13T22:09:55Z</dc:date>
    </item>
    <item>
      <title>Re: Extract words before and after regex pattern from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/617080#M180762</link>
      <description>&lt;P&gt;While I don't fully understand regular expressions, I played around with the one suggested by&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;&amp;nbsp; and came up with the following that I think correctly handles all of the examples I proposed in my last post&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
  length line before after $200;
  input line &amp;amp;;
  prxId = prxparse("/(([\w\']+\W+){0,4})(fruit)\W+(([\w\']+\W+){0,4})/i");
  if prxmatch(prxId, line) then do;
	before = prxposn(prxId, 1, line);
	after = prxposn(prxId, 4, line);
  end;
  drop prxId;
  datalines;
word1 word2 word3 word4 word5 word6 fruit word7 word8 word9 word10 word11
word1 word2's fruit word3 word4 word5 word6 word7 word8 word9 word10 word11
word1 word2 fruit word3 word4 word5 word6 word7 word8 word9 word10 word11
Some pretend that a tomato is a real fruit, others say it's a vegetable
Some say that a tomato is a fruit, I'd say it is a vegetable
Mr Afruiting told us not to eat fruit like apples, pears and oranges
;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jan 2020 22:15:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-words-before-and-after-regex-pattern-from-text/m-p/617080#M180762</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2020-01-13T22:15:15Z</dc:date>
    </item>
  </channel>
</rss>

