<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Using PRXNEXT to extract multiple phrases from a string in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610602#M177829</link>
    <description>&lt;P&gt;I'm searching through medical notes to capture all instances of a phrase, in particular 'carbapenemase producing'. At times this phrasing can occur &amp;gt; 1 time in a string. I've been working with PRXNEXT, which I think is most applicable. As an example for this string:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;If amikacin results are needed, please notify&lt;BR /&gt;Microbiology Lab at ext.xxxx for further testing.&lt;BR /&gt;The organism will be held until x/xx/xx.&lt;BR /&gt;Presumptive Carbapenemase Producing CRE&lt;BR /&gt;See SPMI34 for Carba-R PCR Results&lt;BR /&gt;Not Confirmed Carbapenemase Producing CRE&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;From this comment above, I'd like to extract the phrases&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;Presumptive Carbapenemase Producing&amp;nbsp;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;and&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;Not Confirmed Carbapenemase Producing&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;The code I've been using is here, and still in development:&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#000080"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; chk_one;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;set&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; a01;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;prx = prxparse(&lt;/FONT&gt;&lt;FONT face="Courier New" size="2" color="#800080"&gt;'/((not confirmed\s*)?(ca[bepr]\w+ prod\w+))/'&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;_start_inout = &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT face="Courier New" size="2" color="#008080"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;do&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; hitnum = &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT face="Courier New" size="2" color="#008080"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT face="Courier New" size="2" color="#0000ff"&gt;by&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT face="Courier New" size="2" color="#008080"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT face="Courier New" size="2" color="#0000ff"&gt;until&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; (pos=&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT face="Courier New" size="2" color="#008080"&gt;0&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;call&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; prxnext (prx, _start_inout, length(as_comments), as_comments, pos, len);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;if&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; len &lt;/FONT&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;then&lt;/FONT&gt; &lt;FONT face="Courier New" size="2" color="#0000ff"&gt;do&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;&amp;nbsp;&lt;/FONT&gt;content = substr(as_comments,pos,len);&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;end&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;end&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;　&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#000080"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm able to generate the 2nd phrase "Not Confirmed Carbapenemase Producing" but the 1st one is a work in progress. Any help/advice would be appreciated.&lt;/P&gt;</description>
    <pubDate>Tue, 10 Dec 2019 02:36:43 GMT</pubDate>
    <dc:creator>BrianB4233</dc:creator>
    <dc:date>2019-12-10T02:36:43Z</dc:date>
    <item>
      <title>Using PRXNEXT to extract multiple phrases from a string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610602#M177829</link>
      <description>&lt;P&gt;I'm searching through medical notes to capture all instances of a phrase, in particular 'carbapenemase producing'. At times this phrasing can occur &amp;gt; 1 time in a string. I've been working with PRXNEXT, which I think is most applicable. As an example for this string:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;If amikacin results are needed, please notify&lt;BR /&gt;Microbiology Lab at ext.xxxx for further testing.&lt;BR /&gt;The organism will be held until x/xx/xx.&lt;BR /&gt;Presumptive Carbapenemase Producing CRE&lt;BR /&gt;See SPMI34 for Carba-R PCR Results&lt;BR /&gt;Not Confirmed Carbapenemase Producing CRE&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;From this comment above, I'd like to extract the phrases&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;Presumptive Carbapenemase Producing&amp;nbsp;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;and&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;Not Confirmed Carbapenemase Producing&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;The code I've been using is here, and still in development:&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#000080"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; chk_one;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;set&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; a01;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;prx = prxparse(&lt;/FONT&gt;&lt;FONT face="Courier New" size="2" color="#800080"&gt;'/((not confirmed\s*)?(ca[bepr]\w+ prod\w+))/'&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;_start_inout = &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT face="Courier New" size="2" color="#008080"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;do&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; hitnum = &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT face="Courier New" size="2" color="#008080"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT face="Courier New" size="2" color="#0000ff"&gt;by&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT face="Courier New" size="2" color="#008080"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT face="Courier New" size="2" color="#0000ff"&gt;until&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; (pos=&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT face="Courier New" size="2" color="#008080"&gt;0&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;call&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; prxnext (prx, _start_inout, length(as_comments), as_comments, pos, len);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;if&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; len &lt;/FONT&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;then&lt;/FONT&gt; &lt;FONT face="Courier New" size="2" color="#0000ff"&gt;do&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;&amp;nbsp;&lt;/FONT&gt;content = substr(as_comments,pos,len);&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;end&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#0000ff"&gt;end&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;　&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2" color="#000080"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm able to generate the 2nd phrase "Not Confirmed Carbapenemase Producing" but the 1st one is a work in progress. Any help/advice would be appreciated.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 02:36:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610602#M177829</guid>
      <dc:creator>BrianB4233</dc:creator>
      <dc:date>2019-12-10T02:36:43Z</dc:date>
    </item>
    <item>
      <title>Re: Using PRXNEXT to extract multiple phrases from a string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610608#M177833</link>
      <description>&lt;P&gt;I would use :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;'/(not confirmed|\w+)\s+carbapenemases? producing/i'&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;notes: The ? makes plural optional. The i at the end makes the match case insensitive.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 04:56:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610608#M177833</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-12-10T04:56:01Z</dc:date>
    </item>
    <item>
      <title>Re: Using PRXNEXT to extract multiple phrases from a string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610612#M177836</link>
      <description>&lt;P&gt;On top of what&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;&amp;nbsp;writes.&lt;/P&gt;
&lt;P&gt;1. Compile the RegEx only once. &lt;STRIKE&gt;If you don't then you'll compile a separate RegEx in every iteration of your data step (variable PRX will then have a different value in every iteration - the "pointer" to the compiled RegEx stored in memory).&lt;/STRIKE&gt;&lt;/P&gt;
&lt;PRE&gt;...
retain prx;
if _n_=1 then 
  do;
    prx=prxparse(....
  end;
....&lt;/PRE&gt;
&lt;P&gt;Given that you are using PRXNEXT(): Wouldn't you need somewhere in your loop an output statement?&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 21:33:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610612#M177836</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-12-10T21:33:55Z</dc:date>
    </item>
    <item>
      <title>Re: Using PRXNEXT to extract multiple phrases from a string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610674#M177870</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;wrote:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;1. Compile the RegEx only once.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PRXPARSE expressions are not recompiled unless they contain variable strings. In this case the string is constant, and the PRX pointer will remain the same.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you use a variable in the expression, but only want to compile it once, you can use the 'o' directive at the end of the string, e.g.:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;prx = prxparse(cats('/((not confirmed\s*)?(ca[bepr]\w+ prod\w+',testString,'))/o'));&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;but even that is not necessary if the string is not variable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 11:48:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610674#M177870</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2019-12-10T11:48:39Z</dc:date>
    </item>
    <item>
      <title>Re: Using PRXNEXT to extract multiple phrases from a string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610838#M177959</link>
      <description>&lt;P&gt;Thanks so much for everyone's reply - it's genuinely appreciated.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PGStats - thank you for your explanation regarding the ? and i. The reason I'm using&amp;nbsp;&lt;STRONG&gt;ca[bepr]\w+&lt;/STRONG&gt; is that there's 15+ derivations of the word 'carbapenemase', i.e., it's rife for being misspelled. I'm particularly interested in&amp;nbsp;&lt;STRONG&gt;(not confirmed|\w+)&amp;nbsp;&lt;/STRONG&gt;- can I look back 2 or even 3 words from 'carbapenemase'?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks again, Brian&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 21:16:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610838#M177959</guid>
      <dc:creator>BrianB4233</dc:creator>
      <dc:date>2019-12-10T21:16:09Z</dc:date>
    </item>
    <item>
      <title>Re: Using PRXNEXT to extract multiple phrases from a string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610844#M177963</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/76464"&gt;@s_lassen&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You are right! It definitely doesn't get recompiled clogging up memory.&lt;/P&gt;
&lt;P&gt;I've run out of curiosity below code. Looks like using a retained variable still provides a small performance gain.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options fullstimer;
data have;
  do obs=1 to 100000000;
    output;;
  end;
  stop;
run;

data _null_;
  set have;
  prx = prxparse(cats('/((not confirmed\s*)?(ca[bepr]\w+ prod\w+))/o'));
  output;
run;

data _null_;
  set have;
  retain prx;
  if _n_=1 then
    prx = prxparse(cats('/((not confirmed\s*)?(ca[bepr]\w+ prod\w+))/o'));
  output;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;      real time           6.61 seconds
      user cpu time       6.44 seconds
      system cpu time     0.16 seconds
      memory              1324.57k
      OS Memory           21664.00k


      real time           3.21 seconds
      user cpu time       3.07 seconds
      system cpu time     0.15 seconds
      memory              1304.21k
      OS Memory           21664.00k&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 21:34:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610844#M177963</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-12-10T21:34:26Z</dc:date>
    </item>
    <item>
      <title>Re: Using PRXNEXT to extract multiple phrases from a string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610896#M177989</link>
      <description>&lt;P&gt;If you know what words you are looking for:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;'/(three word prefix|not confirmed|\w+)\s+carbapenemases? producing/i'&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want any three words :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;'/(\w+\W+){1,3}carbapenemases? producing/i'&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note: using \W instead of \s will allow words separated by any non-word characters, including spaces or punctuation.&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2019 04:18:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-PRXNEXT-to-extract-multiple-phrases-from-a-string/m-p/610896#M177989</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-12-11T04:18:50Z</dc:date>
    </item>
  </channel>
</rss>

