If you rerun the data through same program the doubling will occur because on the second run through the once processed data that contains an UPCASE(STRIP(DECODE)) will become UPCASE(UPCASE(STRIP(DECODE)))
As for the meanings in the regular expression patterns
lexpr = prxchange ( 's/(\bstrip\b)\s*(\(.+?\))/UPCASE(\U$1$2)/i', -1, lexpr);
arg1, regex search pattern and replace instructions
s - substitute, as in s/ find / replace /
arg2, -1, globally, as in everywhere in the arg3 string
arg3, string to process
(\bstrip\b)\s*(\(.+?\)) , search pattern. find the word strip followed by 0 or more spaces followed by an open parenthesis, one or more characters and a close parenthesis. The pattern does NOT check for nested parentheses.
\b word boundary
( , open parenthesis starting capture group #1. the contents of a capture group can be used in the replacement
strip , literally, the letters of the word "strip"
) , close parenthesis closing capture group #1.
\s* , zero or more whitespace characters
( , open parenthesis starting capture group #2. the contents of a capture group can be used in the replacement
\( , an escaped open parenthesis means a literal ( character
.+? , . any character, + one or more times, ? the 'or more' is non-greedy and will stop at the first next anchor point located by the remaining part of the pattern
\) , an escaped close parenthesis means a literal ) character, as would be expected for STRIP(decOd). Will not detect properly nestings such as STRIP(substr(decode,3,5))
) , close parenthesis closing capture group #2.
UPCASE(\U$1$2) , how to replace what was found
UPCASE , literally the letters UPCASE
( , literally an open parenthesis (as would be need for an UPCASE function call in source code)
\U , start instruction to upper case whatever follows, ends at either \E or end of replacement instructions
$1 , the first capture group, which would be literally STRIP
$2 , the second capture group, which will be what is inside the original STRIP()
) , literally a close parenthesis (as would be needed to close an UPCASE function call in source code)
i , ignore case during search
lexpr = prxchange ( 's/(["''])(.*?)\1/$1\U$2\E$1/', -1, lexpr); , find a literally quoted string and replace it with the uppercase version.
Search
(["'']) , Capture group 1, starts with either " or '
[] , list of characters, any which will match
(.*?) , Capture group 2, any number of any characters, non-greedy, stops at first next found
\1 , next find is the contents of the first capture group, which is the closing " or '
Replace
$1\U$2\E$1 , the original string, uppercased
What does all the complication mean in either your tranwrd or my regexes? Probably that they are a bit of a Rube Goldberg machine that can be simplified using better processes? Perhaps you need to look alternatives per @ballardw . Better data means less code. Does program source code necessarily have to be in data set variables ?
... View more