<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: prxmatch() regular expression remove roman numbers in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883562#M349099</link>
    <description>&lt;P&gt;I would not use PRX for that, just SCAN:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   set have;
   reason=scan(scan(value,2,' '),1,':,.');
   if countw(value,' ')&amp;gt;2 then 
     ReasonSp=scan(scan(value,-1,' '),1,':,.');
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The reason the ReasonSp calculation is a bit more complicated is to account for the "i.e." in the second observation, otherwise it could be calculated by taking the third word - but instead we have go get the last word, but only if there are more than 2 words.&lt;/P&gt;</description>
    <pubDate>Wed, 05 Jul 2023 13:04:57 GMT</pubDate>
    <dc:creator>s_lassen</dc:creator>
    <dc:date>2023-07-05T13:04:57Z</dc:date>
    <item>
      <title>prxmatch() regular expression remove roman numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883411#M349037</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to receive two variables. One with TEXT1 and the other one with TEXT2.&lt;/P&gt;&lt;P&gt;Below code does not work.&lt;/P&gt;&lt;P&gt;Reason = strip(prxchange("s/(.*?)[:,](.*)/$1/", -1 , strip(Value)));&lt;BR /&gt;ReasonSp = strip(prxchange("s/(.*?)[:,](.*)/$2/", -1, strip(Value)));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please help?&lt;/P&gt;&lt;P&gt;This is my Value variable:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;0. TEXT1.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I. TEXT1, i.e. TEXT2.&lt;BR /&gt;II.a. TEXT1: TEXT2.&lt;BR /&gt;II.b. TEXT1: TEXT2.&lt;BR /&gt;III.a. TEXT1: TEXT2.&lt;BR /&gt;III.b. TEXT1: TEXT2.&lt;BR /&gt;III.c. TEXT1: TEXT2.&lt;BR /&gt;IV. TEXT1.&lt;BR /&gt;V.a. TEXT1: TEXT2.&lt;BR /&gt;V.b. TEXT1: TEXT2.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jul 2023 10:42:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883411#M349037</guid>
      <dc:creator>starosto</dc:creator>
      <dc:date>2023-07-04T10:42:42Z</dc:date>
    </item>
    <item>
      <title>Re: prxmatch() regular expression remove roman numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883434#M349049</link>
      <description>&lt;P&gt;do you expect this?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
length Value $200;
Value="0. TEXT1.";output;
Value="I. TEXT1, i.e. TEXT2.";output;
Value="II.a. TEXT1: TEXT2.";output;
Value="II.b. TEXT1: TEXT2.";output;
Value="III.a. TEXT1: TEXT2.";output;
Value="III.b. TEXT1: TEXT2.";output;
Value="III.c. TEXT1: TEXT2.";output;
Value="IV. TEXT1.";output;
Value="V.a. TEXT1: TEXT2.";output;
Value="V.b. TEXT1: TEXT2.";output;
run;
data want;
   set have;
   length Reason ReasonSp $200;
   Reason   = strip(prxchange("s/([^:,]*?)[:,](.*)/$1/", -1 , strip(Value)));
   if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 04 Jul 2023 11:54:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883434#M349049</guid>
      <dc:creator>Oligolas</dc:creator>
      <dc:date>2023-07-04T11:54:30Z</dc:date>
    </item>
    <item>
      <title>Re: prxmatch() regular expression remove roman numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883515#M349091</link>
      <description>&lt;P&gt;or maybe this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   set have;
   length Reason ReasonSp $200;
   Reason   = strip(prxchange("s/^[0MDCLXVI]+[\.\w]+\s+([^:,]*).*/$1/", -1 , strip(Value)));
   if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You'll find more restrictive matching of roman numerals&amp;nbsp;&lt;A href="https://www.oreilly.com/library/view/regular-expressions-cookbook/9780596802837/ch06s09.html" target="_blank" rel="noopener"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Jul 2023 08:00:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883515#M349091</guid>
      <dc:creator>Oligolas</dc:creator>
      <dc:date>2023-07-05T08:00:16Z</dc:date>
    </item>
    <item>
      <title>Re: prxmatch() regular expression remove roman numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883562#M349099</link>
      <description>&lt;P&gt;I would not use PRX for that, just SCAN:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   set have;
   reason=scan(scan(value,2,' '),1,':,.');
   if countw(value,' ')&amp;gt;2 then 
     ReasonSp=scan(scan(value,-1,' '),1,':,.');
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The reason the ReasonSp calculation is a bit more complicated is to account for the "i.e." in the second observation, otherwise it could be calculated by taking the third word - but instead we have go get the last word, but only if there are more than 2 words.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Jul 2023 13:04:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883562#M349099</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2023-07-05T13:04:57Z</dc:date>
    </item>
    <item>
      <title>Re: prxmatch() regular expression remove roman numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883600#M349103</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="starosto_0-1688574801531.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/85569i21972F0E7616AA24/image-size/medium?v=v2&amp;amp;px=400" role="button" title="starosto_0-1688574801531.png" alt="starosto_0-1688574801531.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Almost ok, I do not want to have i.e. and also I do not wan to have dots at the end. Thanks! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Jul 2023 16:34:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883600#M349103</guid>
      <dc:creator>starosto</dc:creator>
      <dc:date>2023-07-05T16:34:17Z</dc:date>
    </item>
    <item>
      <title>Re: prxmatch() regular expression remove roman numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883602#M349104</link>
      <description>&lt;P&gt;Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Jul 2023 16:54:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883602#M349104</guid>
      <dc:creator>starosto</dc:creator>
      <dc:date>2023-07-05T16:54:13Z</dc:date>
    </item>
    <item>
      <title>Re: prxmatch() regular expression remove roman numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883738#M349149</link>
      <description>&lt;P&gt;It think the best way would be to perform the final cleaning in a second step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But depending on&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;how much data cleaning you need to finally get TEXT1 and TEXT2&lt;/LI&gt;
&lt;LI&gt;wether you want to check the conformity of the syntax or not&lt;/LI&gt;
&lt;LI&gt;your facility to re-understand, debug or refine regex you wrote a time ago&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;you may want to come up with a solution like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data want;
   set have;
   length Reason ReasonSp $200;
   *Roughly extract TEXT1 and TEXT2;
   Reason   = strip(substrn(scan(value,1,':,'),find(Value,' ')));
   ReasonSp = scan(value,2,':,');

   *Data cleaning;
   array checkNClean Reason ReasonSp;
   do over checkNClean;
      *Remove trailing dot;
      if substrn(reverse(strip(checkNClean)),1,1) eq '.' then checkNClean=reverse(substrn(reverse(strip(checkNClean)),2));
      *Remove leading i.e.;
      if substrn(strip(checkNClean),1,4) eq 'i.e.' then checkNClean=strip(substrn(strip(checkNClean),5));
   end;
run;
proc print;run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jul 2023 14:36:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/883738#M349149</guid>
      <dc:creator>Oligolas</dc:creator>
      <dc:date>2023-07-06T14:36:45Z</dc:date>
    </item>
    <item>
      <title>Re: prxmatch() regular expression remove roman numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/884401#M349393</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/441828"&gt;@starosto&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;GIGO&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Incomplete problem description yields incomplete solutions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since C, D, L&amp;nbsp; and M are also "Roman Numerals" you may also need much much more description of your data and the rules involved.&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jul 2023 22:15:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/prxmatch-regular-expression-remove-roman-numbers/m-p/884401#M349393</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-07-11T22:15:11Z</dc:date>
    </item>
  </channel>
</rss>

