2 questions here. Since they are likely to be replied to by the same knowledgeable person, I ask them together.
Thank you for your lights.
1- Characters from the original string get in the replacement string without being requested
data _null_;
STR = 'abcc'||'03'x||'d';
* We match abcc, \3 means repeat group # 3 ;
REGEX = '/(a)(\w*)(.+)\3/ '; * grp1=a grp2=b grp3=c ;
link parse; * Changed: 1=a 2=b 3=c|_d ;
* We match abcc'03'x, \03 means octal 3 ;
REGEX = '/(a)(\w*)(.+)\03/'; * grp1=a grp2=bc grp3=c ;
link parse; * Changed: 1=a 2=bc 3=c|d ;
stop;
parse:
PRX1 = prxparse(REGEX);
call prxsubstr(PRX1, STR, POS, LEN);
put POS= LEN=;
PRX1 = prxparse(cats('s',REGEX,'1=\1 2=\2 3=\3|/'));
A= prxchange(PRX1, -1, STR);
put 'Changed: ' A /;
run;
Why does the character d get into the changed string (after the pipe character)? I never asked for it.
2- Group number 10 is created but not reused
data _null_;
STR = 'abcdefghijj'||'08'x||'b'; * 8 hex = 10 octal;
* We match abcdefghijj, \10 means group # 10;
REGEX = '/(a)(b)(c)(d)(e)(f)(g)(h)(\w*)(.+)\10/ '; * grp8=h grp9=i grp10=j POS=1 LEN=11 ;
link parse; * Changed: 8=h 9=i 10=a0|_b ;
REGEX = '/(a)(b)(c)(d)(e)(f)(g)(h)(\w*)(.+)\010/'; * We match abcdefghijj'08'x, \010 means octal 10;
link parse; * grp8=h grp9=ij grp10=j POS=1 LEN=12 ;
stop; * Changed: 8=h 9=ij 10=a0|b ;
parse:
PRX1 = prxparse(REGEX);
call prxsubstr(PRX1, STR, POS, LEN);
put POS= LEN=;
PRX1 = prxparse(cats('s',REGEX,'8=\8 9=\9 10=\10|/'));
A= prxchange(PRX1, -1, STR);
put 'Changed: ' A /;
run;
Group number 10 is created as shown by the length of the matched string (LEN= ), but when I try to reuse it (after 10=),
\10 is interpreted at group 1 then zero rather than group 10. Is this a SAS limitation or am I doing something I shouldn't?
The PRXCHANGE changes the matched sub-string, within the full source string. The letter d at the end of STR is not part of the match and replace, so is retained in the result. Try adding a Z at the front of the STR value, for example, and you'll see this more clearly.
When specifying the replacements use a $ instead of the \ to specify the groups:
PRX1 = prxparse(cats('s',REGEX,'8=$8 9=$9 10=$10|/'));
The PRXCHANGE changes the matched sub-string, within the full source string. The letter d at the end of STR is not part of the match and replace, so is retained in the result. Try adding a Z at the front of the STR value, for example, and you'll see this more clearly.
When specifying the replacements use a $ instead of the \ to specify the groups:
PRX1 = prxparse(cats('s',REGEX,'8=$8 9=$9 10=$10|/'));
Thank you!
1- Why do \n substitution groups work for single digit groups? Is it a tolerance?
2- abcdefg are not are not carried over to the changed string. So to avoid having characters being copied over they have to be in groups?
or between groups like in the name swap in the SAS documentation where the comma is lost? :
data ReversedNames;
NAME='Jones, Fred';
NAME2= prxchange('s/(\w+), (\w+)/$2 $1/', -1, NAME);
put NAME2=;
run;
NAME2=Fred Jones
The \10 worked as expected to FIND a match, it was just the wrong syntax for the substitution.
And the abcdefg wasn't 'carried over' because it was part of the matching sub-string which was replaced. Put a Z at the start of STR and you'll see.
I love regular expressions but they really do my head in.
Yes I did add the Z and I understand now. I wasn't paying enough attention to the importance of how the "matching sub-string" is used when substituting.[1]
All is clear, thank you.
The last question is why \8 did work, not in the find part of the regex, but in 8=\8 in the substitution part of the regex.
[1] This is also now also clear when regexes are used as a format as shown in thread. 281883 where the whole string has to be matched for the format to be applied.
It seems obvious now as I say it, but it puzzled me at first that we had to start and end with .*
Good question!
Looks like \num is equivalent to $num in a substitution - as long as num is a single digit. A syntax/context anomaly that a real expert might be able to comment on.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.