<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: improving the efficiency of regex with a long list of patterns and observations in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461255#M117308</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My understanding was that SAS used the o suffix by default in recent (9.4 ?) versions of SAS if the RegEx string was a constant.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can't find a source though, so maybe am I mistaken.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Update: I did a quick test, this runs the same with and without the o.&lt;/P&gt;
&lt;PRE&gt;data _null_;
 do I=1 to 1e7; 
   R=prxmatch('/\d\w\d/o',cat(I));
 end;
run;&lt;/PRE&gt;</description>
    <pubDate>Thu, 10 May 2018 05:59:10 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2018-05-10T05:59:10Z</dc:date>
    <item>
      <title>improving the efficiency of regex with a long list of patterns and observations</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461177#M117277</link>
      <description>&lt;P&gt;In addition to having a long&amp;nbsp;list of patterns (over 50) to check using regex, I&amp;nbsp;need to check&amp;nbsp;these patterns against more than 700,000 observations.&lt;/P&gt;&lt;P&gt;Does anyone have any&amp;nbsp;advice for improving efficiency?&lt;/P&gt;&lt;P&gt;Here's the macro I'm using to accomplish this task:&lt;/P&gt;&lt;P&gt;%macro prx(pattern,serial);&lt;BR /&gt;b=prxparse("&amp;amp;pattern");&lt;BR /&gt;if prxmatch(b,serial_number)&amp;gt;0 then do;&lt;BR /&gt;check=1;&lt;BR /&gt;serial=&amp;amp;serial;&lt;BR /&gt;if (length(serial) = length(serial_number)) then check=2;&lt;BR /&gt;end;&lt;BR /&gt;%mend;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 20:04:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461177#M117277</guid>
      <dc:creator>gzr2mz39</dc:creator>
      <dc:date>2018-05-09T20:04:20Z</dc:date>
    </item>
    <item>
      <title>Re: improving the efficiency of regex with a long list of patterns and observations</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461214#M117288</link>
      <description>&lt;P&gt;The first things that comes to mind, without knowing more:&lt;/P&gt;
&lt;P&gt;- can use use functions like&amp;nbsp;&lt;FONT face="courier new,courier"&gt;index()&lt;/FONT&gt;&amp;nbsp;or similar, they a lot cheaper to use than RegEx?&lt;/P&gt;
&lt;P&gt;- can you use &lt;FONT face="courier new,courier"&gt;else if&lt;/FONT&gt;&amp;nbsp; to avoid searching once a pattern is matched?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This may possibly be cheaper too:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;if prxmatch("&amp;amp;pattern",serial_number)&amp;gt;0 then do;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 22:17:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461214#M117288</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-09T22:17:32Z</dc:date>
    </item>
    <item>
      <title>Re: improving the efficiency of regex with a long list of patterns and observations</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461241#M117304</link>
      <description>&lt;P&gt;Make sure your pattern uses the "o" suffix, as in "/abc[a-c]+/o", as it signals to&amp;nbsp;the compiler that the pattern is a constant that only needs to be compiled once.&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 02:59:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461241#M117304</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-05-10T02:59:04Z</dc:date>
    </item>
    <item>
      <title>Re: improving the efficiency of regex with a long list of patterns and observations</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461247#M117306</link>
      <description>&lt;P&gt;As others already wrote: Certainly use ELSE and use functions like find() or index() where possible.&lt;/P&gt;
&lt;P&gt;If leading and trailing blanks are not important then use STRIP() as well: prxmatch(&amp;lt;regex&amp;gt;,strip(&amp;lt;variable&amp;gt;))&lt;/P&gt;
&lt;P&gt;And last but not least: Tweak your RegEx; especially the one's applied on long strings - ie Greedy vs. Lazy&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 04:13:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461247#M117306</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2018-05-10T04:13:21Z</dc:date>
    </item>
    <item>
      <title>Re: improving the efficiency of regex with a long list of patterns and observations</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461255#M117308</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My understanding was that SAS used the o suffix by default in recent (9.4 ?) versions of SAS if the RegEx string was a constant.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can't find a source though, so maybe am I mistaken.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Update: I did a quick test, this runs the same with and without the o.&lt;/P&gt;
&lt;PRE&gt;data _null_;
 do I=1 to 1e7; 
   R=prxmatch('/\d\w\d/o',cat(I));
 end;
run;&lt;/PRE&gt;</description>
      <pubDate>Thu, 10 May 2018 05:59:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improving-the-efficiency-of-regex-with-a-long-list-of-patterns/m-p/461255#M117308</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-10T05:59:10Z</dc:date>
    </item>
  </channel>
</rss>

