<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic SQL-expression 200-byte capping inconsistency in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/SQL-expression-200-byte-capping-inconsistency/m-p/682073#M206441</link>
    <description>&lt;P&gt;I've been puzzling over this post (&lt;A href="https://communities.sas.com/t5/SAS-Programming/Case-statement-trims-length-to-200/m-p/336084/thread-id/76157" target="_self"&gt;Case-statement-trims-length-to-200&lt;/A&gt;) by&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt; - and the weirdness continues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Before I found Chris's extensive analysis of what was going on with his problem, I managed to get a Perl expression to work at more than 200 bytes. But I had to jump through hoops to do it. And I still don't know quite what's going on.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A few of our source files have 32k-byte HTML text in them, which requires decoding and tag stripping. Not only that, but the often contain tabs, carriage returns and the like - all of which are a barrier for textual analysis.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My original code was&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;strip(compbl(translate(prxchange("s/&amp;lt;.*?&amp;gt;/ /", -1, htmldecode(htmldecode(message))), '     ', '08090a0b0c0d'x))) as message length=32767&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It was unwieldy, but it worked. (htmldecode is called twice, because sometimes that's the only way of getting rid of the special characters - I don't control the source data!) And then&amp;nbsp;I looked at the output and saw that no value was longer than 200 bytes - often less because of utf-8.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But cutting out one function at a time,&amp;nbsp; I found that prxchange was working as expected. So I created an extra work column and added one more function at a time, and it still worked.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's what I've ended up with:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;prxchange("s/&amp;lt;.*?&amp;gt;/ /", -1, htmldecode(htmldecode(message))) as untagged_message length=32767,
strip(compbl(translate(calculated untagged_message, '     ', '08090a0b0c0d'x))) as message length=32767&amp;nbsp;/* six&amp;nbsp;spaces&amp;nbsp;between&amp;nbsp;the&amp;nbsp;quotes&amp;nbsp;*/&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;I have no idea why splitting it into two expressions and columns works. Does anybody have any suggestions?&lt;/P&gt;</description>
    <pubDate>Mon, 07 Sep 2020 22:38:59 GMT</pubDate>
    <dc:creator>LaurieF</dc:creator>
    <dc:date>2020-09-07T22:38:59Z</dc:date>
    <item>
      <title>SQL-expression 200-byte capping inconsistency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-expression-200-byte-capping-inconsistency/m-p/682073#M206441</link>
      <description>&lt;P&gt;I've been puzzling over this post (&lt;A href="https://communities.sas.com/t5/SAS-Programming/Case-statement-trims-length-to-200/m-p/336084/thread-id/76157" target="_self"&gt;Case-statement-trims-length-to-200&lt;/A&gt;) by&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt; - and the weirdness continues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Before I found Chris's extensive analysis of what was going on with his problem, I managed to get a Perl expression to work at more than 200 bytes. But I had to jump through hoops to do it. And I still don't know quite what's going on.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A few of our source files have 32k-byte HTML text in them, which requires decoding and tag stripping. Not only that, but the often contain tabs, carriage returns and the like - all of which are a barrier for textual analysis.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My original code was&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;strip(compbl(translate(prxchange("s/&amp;lt;.*?&amp;gt;/ /", -1, htmldecode(htmldecode(message))), '     ', '08090a0b0c0d'x))) as message length=32767&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It was unwieldy, but it worked. (htmldecode is called twice, because sometimes that's the only way of getting rid of the special characters - I don't control the source data!) And then&amp;nbsp;I looked at the output and saw that no value was longer than 200 bytes - often less because of utf-8.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But cutting out one function at a time,&amp;nbsp; I found that prxchange was working as expected. So I created an extra work column and added one more function at a time, and it still worked.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's what I've ended up with:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;prxchange("s/&amp;lt;.*?&amp;gt;/ /", -1, htmldecode(htmldecode(message))) as untagged_message length=32767,
strip(compbl(translate(calculated untagged_message, '     ', '08090a0b0c0d'x))) as message length=32767&amp;nbsp;/* six&amp;nbsp;spaces&amp;nbsp;between&amp;nbsp;the&amp;nbsp;quotes&amp;nbsp;*/&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;I have no idea why splitting it into two expressions and columns works. Does anybody have any suggestions?&lt;/P&gt;</description>
      <pubDate>Mon, 07 Sep 2020 22:38:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-expression-200-byte-capping-inconsistency/m-p/682073#M206441</guid>
      <dc:creator>LaurieF</dc:creator>
      <dc:date>2020-09-07T22:38:59Z</dc:date>
    </item>
    <item>
      <title>Re: SQL-expression 200-byte capping inconsistency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-expression-200-byte-capping-inconsistency/m-p/682093#M206450</link>
      <description>&lt;P&gt;Yes, this whole thing is most messy for no good reason at all, and needs (needed? maybe it's too late) a good clean up.&lt;/P&gt;
&lt;P&gt;The data step behaves differently, and results are as expected. The SQL interpreter is the confused party here.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data T1;
 length X Y1 Z1 $32000;
 X=repeat('1234&amp;lt;zxcv&amp;gt; '||'090A'x,200);
 Y1=       prxchange("s/&amp;lt;.*?&amp;gt;|[\x08-\x0D]/ /", -1, htmldecode(htmldecode(X))) ;
 Z1=compbl(prxchange("s/&amp;lt;.*?&amp;gt;|[\x08-\x0D]/ /", -1, htmldecode(htmldecode(X))));
run;
proc sql;
  create table T2 as
  select X
        ,        prxchange("s/&amp;lt;.*?&amp;gt;|[\x08-\x0D]/ /", -1, htmldecode(htmldecode(X)))  as Y2 length=32000
        , compbl(prxchange("s/&amp;lt;.*?&amp;gt;|[\x08-\x0D]/ /", -1, htmldecode(htmldecode(X)))) as Z2 length=32000
  from T1;
quit;
data ALL;
  merge T1 T2;
  LY1=length(Y1);
  LY2=length(Y2);
  LZ1=length(Z1);
  LZ2=length(Z2);
  putlog LY1= LY2= LZ1= LZ2= ;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;LY1=1604 LY2=1604 LZ1=1004 LZ2=124&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2020 03:45:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-expression-200-byte-capping-inconsistency/m-p/682093#M206450</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-09-08T03:45:31Z</dc:date>
    </item>
  </channel>
</rss>

