I've been puzzling over this post (Case-statement-trims-length-to-200) by @ChrisNZ - and the weirdness continues.
Before I found Chris's extensive analysis of what was going on with his problem, I managed to get a Perl expression to work at more than 200 bytes. But I had to jump through hoops to do it. And I still don't know quite what's going on.
A few of our source files have 32k-byte HTML text in them, which requires decoding and tag stripping. Not only that, but the often contain tabs, carriage returns and the like - all of which are a barrier for textual analysis.
My original code was
strip(compbl(translate(prxchange("s/<.*?>/ /", -1, htmldecode(htmldecode(message))), ' ', '08090a0b0c0d'x))) as message length=32767
It was unwieldy, but it worked. (htmldecode is called twice, because sometimes that's the only way of getting rid of the special characters - I don't control the source data!) And then I looked at the output and saw that no value was longer than 200 bytes - often less because of utf-8.
But cutting out one function at a time, I found that prxchange was working as expected. So I created an extra work column and added one more function at a time, and it still worked.
Here's what I've ended up with:
prxchange("s/<.*?>/ /", -1, htmldecode(htmldecode(message))) as untagged_message length=32767,
strip(compbl(translate(calculated untagged_message, ' ', '08090a0b0c0d'x))) as message length=32767 /* six spaces between the quotes */
I have no idea why splitting it into two expressions and columns works. Does anybody have any suggestions?
... View more