<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Array of regular expressions in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807424#M318346</link>
    <description>&lt;P&gt;Hallo&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/180895"&gt;@msauer&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As always,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15410"&gt;@data_null__&lt;/a&gt;'s solution is correct. Last month a similar issue was discussed in&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Programming/PRXMATCH-not-work-in-nested-loop/m-p/799715" target="_blank" rel="noopener"&gt;PRXMATCH not work in nested loop&lt;/A&gt;, where it was&amp;nbsp;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462" target="_blank" rel="noopener"&gt;PGStats&lt;/A&gt;&amp;nbsp;who suggested the array of IDs of compiled patterns created with the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p06i7305izsnvcn1ru9147suzyv5.htm" target="_blank" rel="noopener"&gt;PRXPARSE function&lt;/A&gt;. The issue is not your array itself, but the fact that varying patterns are used in the same call of the PRXCHANGE or PRXPARSE function in the code (in a DO loop), which conflicts with the use of the "o" ("compile once") option.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just for demonstration (not meant as a solution)&lt;FONT face="helvetica"&gt;:&lt;/FONT&gt; Replace the DO loop in the "minimal example" of your initial post with two one-iteration loops and &lt;FONT face="courier new,courier"&gt;with_array&lt;/FONT&gt; &lt;EM&gt;will&lt;/EM&gt; be updated correctly:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;  do i = 1 to dim(rules)-1;
    with_array = prxchange(rules[i], -1, with_array);
  end;
  do i = 2 to dim(rules);
    with_array = prxchange(rules[i], -1, with_array);
  end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you said, you can safely omit the&amp;nbsp;"o" option (i.e., remove the "o" either in dataset RULES or by adjustments to the code) when the regular expressions are compiled only once anyway because of the "&lt;FONT face="courier new,courier"&gt;if _n_=1&lt;/FONT&gt; ..." and the PRXPARSE function. Otherwise, the elements of the &lt;FONT face="courier new,courier"&gt;rules_id&lt;/FONT&gt; array will all contain the same value (1) rather than 1, 2, ... and hence only represent the first rule. You can insert&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;put rules_id[i] = ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;after the assignment statement&amp;nbsp;&lt;FONT face="courier new,courier"&gt;rules_id[i] =&lt;/FONT&gt; ... to see the difference. I would define the &lt;FONT face="courier new,courier"&gt;rules_id&lt;/FONT&gt; array as &lt;FONT face="courier new,courier"&gt;_temporary_&lt;/FONT&gt;&amp;nbsp;(advantage: automatic RETAIN and DROP). The dimension of the array does not need to match &lt;FONT face="courier new,courier"&gt;dim(rules)&lt;/FONT&gt; exactly as long as it's greater than or equal to that value, e.g. 9999. The DO loops will use &lt;FONT face="courier new,courier"&gt;dim(&lt;EM&gt;rules&lt;/EM&gt;)&lt;/FONT&gt; as their end value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think the documentation means that the "o" option tells the compiler to compile the regular expression only once if it is in fact&amp;nbsp;&lt;EM&gt;constant&lt;/EM&gt;, yet provided as a &lt;EM&gt;variable&lt;/EM&gt; (which in principle &lt;EM&gt;could&lt;/EM&gt; change its value) in the code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Simple example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test(drop=ptn);
retain ptn 's/(C\w+) \w+ (Disease)/$1 $2/o';
set sashelp.heart;
length shortDC $16;
shortDC=prxchange(ptn,1,DeathCause);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Omitting the "o" increases the run time considerably (but it's still &amp;lt;1 second on my computer).&lt;/P&gt;</description>
    <pubDate>Tue, 12 Apr 2022 16:01:33 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2022-04-12T16:01:33Z</dc:date>
    <item>
      <title>Array of regular expressions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807343#M318295</link>
      <description>&lt;P&gt;I am doing address normalization using a large number of regular expressions of the form&lt;/P&gt;&lt;PRE&gt;s/pattern/replace/&lt;/PRE&gt;&lt;P&gt;These regex are stored as a string in a dataset with one line and several hundreds of columns, such that I can do&lt;/P&gt;&lt;PRE&gt;data want;
  set addresses;
  if _n_ = 1 then do;
    set rules;
    array rules [*] rules:;
  end;
  do i = 1 to dim(rules);
    address = prxchange(rules[i], -1, address);
  end;
run;&lt;/PRE&gt;&lt;P&gt;Each rule is a constant, thus I added the "o" modifier to compile each regex only once. However, then only the first regex is compiled.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Consider the following minimal example to illustrate this.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data _null_;
  rules1 = "s/[-]/ /o";
  rules2 = "s/(\().*//o";
  array rules [*] rules:;
  name = "Berlin-Mitte (Germany)";
  with_array = upcase(name);
  do i = 1 to dim(rules);
    with_array = prxchange(rules[i], -1, with_array);
  end;&lt;BR /&gt;  no_array = upcase(name);&lt;BR /&gt;  no_array = prxchange(rules1, -1, no_array);&lt;BR /&gt;  no_array = prxchange(rules2, -1, no_array);&lt;BR /&gt;  put with_array=;&lt;BR /&gt;  put no_array=;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;This outputs&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;with_array=BERLIN MITTE (GERMANY) &amp;lt;&amp;lt;&amp;lt; should be BERLIN MITTE
no_array=BERLIN MITTE&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;where the second regex was not used in do-loop. If I omit the "o" modifier in the regex, everything works as expected.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What am I missing here?&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2022 10:59:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807343#M318295</guid>
      <dc:creator>msauer</dc:creator>
      <dc:date>2022-04-12T10:59:25Z</dc:date>
    </item>
    <item>
      <title>Re: Array of regular expressions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807346#M318297</link>
      <description>&lt;P&gt;Could you provide some example data? If your data doesn't contain any sensitive information, you can use this macro to convert your data set into a DATALINES statement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/" target="_blank" rel="noopener"&gt;https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2022 11:16:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807346#M318297</guid>
      <dc:creator>maguiremq</dc:creator>
      <dc:date>2022-04-12T11:16:08Z</dc:date>
    </item>
    <item>
      <title>Re: Array of regular expressions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807351#M318302</link>
      <description>The question includes a minimal example with two regular expressions and one "address" to illustrate the issue. No need for more data.</description>
      <pubDate>Tue, 12 Apr 2022 11:27:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807351#M318302</guid>
      <dc:creator>msauer</dc:creator>
      <dc:date>2022-04-12T11:27:20Z</dc:date>
    </item>
    <item>
      <title>Re: Array of regular expressions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807376#M318317</link>
      <description>&lt;P&gt;Use PRXPARSE&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
   if _n_ eq 1 then do;
      rules1 = prxparse("s/[-]/ /o");
      rules2 = prxparse("s/(\().*//o");
      array rules [*] rules:;
      retain rules:;
      end;
   name = "Berlin-Mitte (Germany)";
   with_array = upcase(name);
   do i = 1 to dim(rules);
      with_array = prxchange(rules[i], -1, with_array);
      end;

   no_array = upcase(name);
   no_array = prxchange("s/[-]/ /o", -1, no_array);
   no_array = prxchange("s/(\().*//o", -1, no_array);
   put with_array=;
   put no_array=;
   run;

with_array=BERLIN MITTE
no_array=BERLIN MITTE&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 12 Apr 2022 12:59:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807376#M318317</guid>
      <dc:creator>data_null__</dc:creator>
      <dc:date>2022-04-12T12:59:28Z</dc:date>
    </item>
    <item>
      <title>Re: Array of regular expressions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807402#M318331</link>
      <description>&lt;P&gt;The same issue occurs with PRXPARSE, too. PRXDEBUG only shows compiling the first regex.&lt;/P&gt;&lt;LI-CODE lang="sas"&gt;data _null_;
  if _n_ eq 1 then do;
    rules1 = "s/[-]/ /o";
    rules2 = "s/(\().*//o";
    array rules [*] rules:;
    array rules_id [2];
    do i = 1 to dim(rules);
      rules_id[i] = prxparse(rules[i]);
    end;
    retain rules:;
  end;
  name = "Berlin-Mitte (Germany)";
  with_array = upcase(name);
  do i = 1 to dim(rules);
    with_array = prxchange(rules_id[i], -1, with_array);
  end;
  no_array = upcase(name);
  no_array = prxchange("s/[-]/ /o", -1, no_array);
  no_array = prxchange("s/(\().*//o", -1, no_array);
  put with_array=;
  put no_array=;
run;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Of course, I can omit the "o" modifier with this construct, since I explicitly compile the regex only once. But isn't the whole benefit of the modifier, that this should not be required. At least the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/v_021/lefunctionsref/p06i7305izsnvcn1ru9147suzyv5.htm" target="_self"&gt;documentation&lt;/A&gt; says so&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;This behavior simplifies the code because you do not need to use an initialization block (IF _N_ =1) to initialize Perl regular expressions.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Tue, 12 Apr 2022 13:53:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807402#M318331</guid>
      <dc:creator>msauer</dc:creator>
      <dc:date>2022-04-12T13:53:22Z</dc:date>
    </item>
    <item>
      <title>Re: Array of regular expressions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807424#M318346</link>
      <description>&lt;P&gt;Hallo&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/180895"&gt;@msauer&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As always,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15410"&gt;@data_null__&lt;/a&gt;'s solution is correct. Last month a similar issue was discussed in&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Programming/PRXMATCH-not-work-in-nested-loop/m-p/799715" target="_blank" rel="noopener"&gt;PRXMATCH not work in nested loop&lt;/A&gt;, where it was&amp;nbsp;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462" target="_blank" rel="noopener"&gt;PGStats&lt;/A&gt;&amp;nbsp;who suggested the array of IDs of compiled patterns created with the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p06i7305izsnvcn1ru9147suzyv5.htm" target="_blank" rel="noopener"&gt;PRXPARSE function&lt;/A&gt;. The issue is not your array itself, but the fact that varying patterns are used in the same call of the PRXCHANGE or PRXPARSE function in the code (in a DO loop), which conflicts with the use of the "o" ("compile once") option.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just for demonstration (not meant as a solution)&lt;FONT face="helvetica"&gt;:&lt;/FONT&gt; Replace the DO loop in the "minimal example" of your initial post with two one-iteration loops and &lt;FONT face="courier new,courier"&gt;with_array&lt;/FONT&gt; &lt;EM&gt;will&lt;/EM&gt; be updated correctly:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;  do i = 1 to dim(rules)-1;
    with_array = prxchange(rules[i], -1, with_array);
  end;
  do i = 2 to dim(rules);
    with_array = prxchange(rules[i], -1, with_array);
  end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you said, you can safely omit the&amp;nbsp;"o" option (i.e., remove the "o" either in dataset RULES or by adjustments to the code) when the regular expressions are compiled only once anyway because of the "&lt;FONT face="courier new,courier"&gt;if _n_=1&lt;/FONT&gt; ..." and the PRXPARSE function. Otherwise, the elements of the &lt;FONT face="courier new,courier"&gt;rules_id&lt;/FONT&gt; array will all contain the same value (1) rather than 1, 2, ... and hence only represent the first rule. You can insert&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;put rules_id[i] = ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;after the assignment statement&amp;nbsp;&lt;FONT face="courier new,courier"&gt;rules_id[i] =&lt;/FONT&gt; ... to see the difference. I would define the &lt;FONT face="courier new,courier"&gt;rules_id&lt;/FONT&gt; array as &lt;FONT face="courier new,courier"&gt;_temporary_&lt;/FONT&gt;&amp;nbsp;(advantage: automatic RETAIN and DROP). The dimension of the array does not need to match &lt;FONT face="courier new,courier"&gt;dim(rules)&lt;/FONT&gt; exactly as long as it's greater than or equal to that value, e.g. 9999. The DO loops will use &lt;FONT face="courier new,courier"&gt;dim(&lt;EM&gt;rules&lt;/EM&gt;)&lt;/FONT&gt; as their end value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think the documentation means that the "o" option tells the compiler to compile the regular expression only once if it is in fact&amp;nbsp;&lt;EM&gt;constant&lt;/EM&gt;, yet provided as a &lt;EM&gt;variable&lt;/EM&gt; (which in principle &lt;EM&gt;could&lt;/EM&gt; change its value) in the code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Simple example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test(drop=ptn);
retain ptn 's/(C\w+) \w+ (Disease)/$1 $2/o';
set sashelp.heart;
length shortDC $16;
shortDC=prxchange(ptn,1,DeathCause);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Omitting the "o" increases the run time considerably (but it's still &amp;lt;1 second on my computer).&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2022 16:01:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807424#M318346</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2022-04-12T16:01:33Z</dc:date>
    </item>
    <item>
      <title>Re: Array of regular expressions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807550#M318406</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp;for the detailed explanation.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Apr 2022 05:24:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Array-of-regular-expressions/m-p/807550#M318406</guid>
      <dc:creator>msauer</dc:creator>
      <dc:date>2022-04-13T05:24:16Z</dc:date>
    </item>
  </channel>
</rss>

