BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ChrisNZ
Tourmaline | Level 20

2 questions here. Since they are likely to be replied to by the same knowledgeable person, I ask them together.

Thank you for your lights.

1- Characters from the original string get in the replacement string without being requested

data _null_;

  STR   = 'abcc'||'03'x||'d';  

                                              * We match abcc, \3 means repeat group # 3 ;

  REGEX = '/(a)(\w*)(.+)\3/ '* grp1=a grp2=b grp3=c                     ;

  link parse;                   * Changed: 1=a 2=b 3=c|_d ;

 

                                              * We match abcc'03'x, \03 means octal 3 ;

  REGEX = '/(a)(\w*)(.+)\03/'* grp1=a grp2=bc grp3=c                    ;

  link parse;                   * Changed: 1=a 2=bc 3=c|d ;

  stop;

  parse:

  PRX1 = prxparse(REGEX);   

  call prxsubstr(PRX1, STR, POS, LEN);

  put POS= LEN=; 

  PRX1 = prxparse(cats('s',REGEX,'1=\1 2=\2 3=\3|/'));   

  A= prxchange(PRX1, -1, STR);

  put  'Changed: ' A /; 

run;

Why does the character d get into the changed string (after the pipe character)? I never asked for it.

2- Group number 10 is created but not reused

data _null_;

  STR  = 'abcdefghijj'||'08'x||'b';                  * 8 hex = 10 octal;

                                                                             * We match abcdefghijj, \10 means group # 10;

  REGEX = '/(a)(b)(c)(d)(e)(f)(g)(h)(\w*)(.+)\10/ '; * grp8=h grp9=i grp10=j      POS=1 LEN=11   ;

  link parse;                                         * Changed: 8=h 9=i 10=a0|_b                 ;

 

  REGEX = '/(a)(b)(c)(d)(e)(f)(g)(h)(\w*)(.+)\010/'; * We match abcdefghijj'08'x, \010 means octal 10;

  link parse;                                         * grp8=h grp9=ij grp10=j      POS=1 LEN=12   ;

  stop;                                               * Changed: 8=h 9=ij 10=a0|b                    ;

  parse:

  PRX1 = prxparse(REGEX);   

   call prxsubstr(PRX1, STR, POS, LEN);

   put POS= LEN=; 

  PRX1 = prxparse(cats('s',REGEX,'8=\8 9=\9 10=\10|/'));   

  A= prxchange(PRX1, -1, STR);

  put  'Changed: ' A /; 

run;

Group number 10 is created as shown by the length of the matched string (LEN= ), but when I try to reuse it (after 10=),

\10 is interpreted at group 1 then zero rather than group 10. Is this a SAS limitation or am I doing something I shouldn't?

1 ACCEPTED SOLUTION

Accepted Solutions
JerryLeBreton
Pyrite | Level 9

The PRXCHANGE changes the matched sub-string, within the full source string.  The letter d at the end of STR is not part of the match and replace, so is retained in the result.  Try adding a Z at the front of the STR value, for example,  and you'll see this more clearly.

When specifying the replacements use a $ instead of the \ to specify the groups:

PRX1 = prxparse(cats('s',REGEX,'8=$8 9=$9 10=$10|/'));   

View solution in original post

5 REPLIES 5
JerryLeBreton
Pyrite | Level 9

The PRXCHANGE changes the matched sub-string, within the full source string.  The letter d at the end of STR is not part of the match and replace, so is retained in the result.  Try adding a Z at the front of the STR value, for example,  and you'll see this more clearly.

When specifying the replacements use a $ instead of the \ to specify the groups:

PRX1 = prxparse(cats('s',REGEX,'8=$8 9=$9 10=$10|/'));   

ChrisNZ
Tourmaline | Level 20

Thank you!

1- Why do \n substitution groups work for single digit groups? Is it a tolerance?

2- abcdefg are not are not carried over to the changed string. So to avoid having characters being copied over they have to be in groups?

  or between groups like in the name swap in the SAS documentation where the comma is lost? :

data ReversedNames;

  NAME='Jones, Fred';

  NAME2= prxchange('s/(\w+), (\w+)/$2 $1/', -1, NAME);

  put NAME2=;

run;

NAME2=Fred Jones

JerryLeBreton
Pyrite | Level 9

The \10 worked as expected to  FIND a match, it was just the wrong syntax for the substitution.

And the abcdefg wasn't 'carried over' because it was part of the matching sub-string which was replaced. Put  a Z at the start of STR and you'll see.

I love regular expressions but they really do my head in.

ChrisNZ
Tourmaline | Level 20

Yes I did add the Z and I understand  now. I wasn't paying enough attention to the importance of how the "matching sub-string" is used when substituting.[1]

All is clear, thank you. Smiley Happy

The last question is why \8 did work, not in the find part of the regex, but in  8=\8  in the substitution part of the regex.

[1] This is also now also clear when regexes are used as a format as shown in thread. 281883 where the whole string has to be matched for the format to be applied.

It seems obvious now as I say it, but it puzzled me at first that we had to start and end with .* 

JerryLeBreton
Pyrite | Level 9

Good question! 

Looks like \num is equivalent to $num in a substitution - as long as num is a single digit.  A syntax/context anomaly that a real expert might be able to comment on.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1329 views
  • 3 likes
  • 2 in conversation