<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Proc format regex output if input doesnt match above conditions in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Proc-format-regex-output-if-input-doesnt-match-above-conditions/m-p/874325#M345428</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/442527"&gt;@Ap01&lt;/a&gt;&amp;nbsp;and welcome to the SAS Support Communities!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regular expressions are a powerful tool, but simpler functions are much faster, so better use them if they are sufficient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your example the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p00ab6ey29t2i8n1ihel88tqtga9.htm" target="_blank" rel="noopener"&gt;FIND function&lt;/A&gt; (and &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p0jshdjy2z9zdzn1h7k90u99lyq6.htm" target="_blank" rel="noopener"&gt;SCAN&lt;/A&gt;&amp;nbsp;for the "other" case) is sufficient:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input c $30.;
cards;
Pineapple.colour.yellow
Pinenuts.size.small
Apple.fruit.red
Grape.shape.round
Orange.fresh.juice
Mango.raw.pickled
;

data want;
set have;
length d $8;
if find(c,'pine','i') then d='pine';
else if find(c,'apple','i') then d='apple';
else if find(c,'gra','i') then d='gra';
else d=scan(c,3,'.');
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Using a 6-million-observation input dataset created by stacking one million copies of the above HAVE dataset, the data step creating WANT was about &lt;EM&gt;nine times faster&lt;/EM&gt; on my workstation than an equivalent step using your $TESTER. informat with the fourth regex added (and curly quotes replaced by straight quotes)&lt;FONT face="helvetica"&gt;:&lt;/FONT&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
set have;
length d $8;
d=input(c,$tester.);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Reading those 6 million observations from a text file was still 4 - 6 times faster with FIND and SCAN, e.g., applied to the _INFILE_ variable.&lt;/P&gt;</description>
    <pubDate>Sun, 07 May 2023 11:29:18 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2023-05-07T11:29:18Z</dc:date>
    <item>
      <title>Proc format regex output if input doesnt match above conditions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-format-regex-output-if-input-doesnt-match-above-conditions/m-p/874322#M345426</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I’m trying to define a proc format with regular expressions where if the conditions specified aren’t met, then the output is the portion of the input string after the 2nd full stop.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Example inputs:&lt;/P&gt;&lt;P&gt;Pineapple.colour.yellow&lt;/P&gt;&lt;P&gt;Pinenuts.size.small&lt;/P&gt;&lt;P&gt;Apple.fruit.red&lt;/P&gt;&lt;P&gt;Grape.shape.round&lt;/P&gt;&lt;P&gt;Orange.fresh.juice&lt;/P&gt;&lt;P&gt;Mango.raw.pickled&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;Proc Format;

Invalue $tester (notsorted) 
“s/(.*)(pine)(.*)/pine/i” (regexpe) = _same_
“s/(.*)(apple)(.*)/apple/i” (regexpe) = _same_
“s/(.*)(gra)(.*)/gra/i” (regexpe) = _same_
/*Other= &amp;lt;not sure what goes here&amp;gt;*/
;
Quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I want the outputs for the input strings above to be:&amp;nbsp;&lt;/P&gt;&lt;P&gt;pine&lt;/P&gt;&lt;P&gt;pine&lt;/P&gt;&lt;P&gt;apple&lt;/P&gt;&lt;P&gt;gra&lt;/P&gt;&lt;P&gt;juice&lt;/P&gt;&lt;P&gt;pickled&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;nbsp;If I just define a final regex condition like&lt;/P&gt;&lt;P&gt;“s/.*\..*\.(.*)/\1/“ (i.e. replace any string with 2 full stops with the portion after the 2nd full stop), then it takes a really long time to run when I use the format in a data step.&lt;BR /&gt;My dataset is huge with a lot of long strings.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any way I can avoid doing this matching step and just use a regex in the ‘other’ step?&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 07 May 2023 10:32:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-format-regex-output-if-input-doesnt-match-above-conditions/m-p/874322#M345426</guid>
      <dc:creator>Ap01</dc:creator>
      <dc:date>2023-05-07T10:32:21Z</dc:date>
    </item>
    <item>
      <title>Re: Proc format regex output if input doesnt match above conditions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-format-regex-output-if-input-doesnt-match-above-conditions/m-p/874325#M345428</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/442527"&gt;@Ap01&lt;/a&gt;&amp;nbsp;and welcome to the SAS Support Communities!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regular expressions are a powerful tool, but simpler functions are much faster, so better use them if they are sufficient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your example the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p00ab6ey29t2i8n1ihel88tqtga9.htm" target="_blank" rel="noopener"&gt;FIND function&lt;/A&gt; (and &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p0jshdjy2z9zdzn1h7k90u99lyq6.htm" target="_blank" rel="noopener"&gt;SCAN&lt;/A&gt;&amp;nbsp;for the "other" case) is sufficient:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input c $30.;
cards;
Pineapple.colour.yellow
Pinenuts.size.small
Apple.fruit.red
Grape.shape.round
Orange.fresh.juice
Mango.raw.pickled
;

data want;
set have;
length d $8;
if find(c,'pine','i') then d='pine';
else if find(c,'apple','i') then d='apple';
else if find(c,'gra','i') then d='gra';
else d=scan(c,3,'.');
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Using a 6-million-observation input dataset created by stacking one million copies of the above HAVE dataset, the data step creating WANT was about &lt;EM&gt;nine times faster&lt;/EM&gt; on my workstation than an equivalent step using your $TESTER. informat with the fourth regex added (and curly quotes replaced by straight quotes)&lt;FONT face="helvetica"&gt;:&lt;/FONT&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
set have;
length d $8;
d=input(c,$tester.);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Reading those 6 million observations from a text file was still 4 - 6 times faster with FIND and SCAN, e.g., applied to the _INFILE_ variable.&lt;/P&gt;</description>
      <pubDate>Sun, 07 May 2023 11:29:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-format-regex-output-if-input-doesnt-match-above-conditions/m-p/874325#M345428</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2023-05-07T11:29:18Z</dc:date>
    </item>
    <item>
      <title>Re: Proc format regex output if input doesnt match above conditions</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-format-regex-output-if-input-doesnt-match-above-conditions/m-p/874868#M345678</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I think you could define your format like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA test;
input string: $50.;
datalines;
Pineapple.colour.yellow
Pinenuts.size.small
Apple.fruit.red
Grape.shape.round
Orange.fresh.juice
Mango.raw.pickled
;
RUN;

PROC FORMAT;
   Invalue $tester (notsorted) 
   's/(.*?)(pine|apple)(.*)/$2/i' (regexpe) = _same_ /*non greedy to match pine first*/
   's/(.*)(gra)(.*)/$2/i' (regexpe) = _same_
   's/[^\.]*\.[^\.]*\.(.*)/$1/i' (regexpe) = _same_
   Other=_error_
   ;
run;

DATA test2;
   SET test;
   string2=input(string,$tester.);
RUN;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 10 May 2023 09:14:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-format-regex-output-if-input-doesnt-match-above-conditions/m-p/874868#M345678</guid>
      <dc:creator>Oligolas</dc:creator>
      <dc:date>2023-05-10T09:14:31Z</dc:date>
    </item>
  </channel>
</rss>

