Hello:
I don't understand the explaination for the code of \u$1\L$2\E. Could someone help?
http://analytics.ncsu.edu/sesug/2012/CT-03.pdf
The replacement expression then makes use of case-folding prefixes and spans
to control the case of the capture buffers. \u makes the next character that follows it
uppercase, which would be the d or m in first capture buffer. The case-folding span \L
makes characters that follow lowercase until the end of the replacement or until disabled by
\E, which is applied to the second capture buffer. Case-folding functionality was introduced
in SAS V9.2.
data guest_list ;
input attendees $30. ;
datalines;
MR and MRS DRaco Malfoy
mr and dr M Johnson
MrS. O.M. Goodness
DR. Evil
mr&mrs R. Miller
;
run ;
proc sql ;
select attendees
, prxchange( 's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E/io', -1, attendees ) as attendees2
from guest_list
;
quit ;
's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E
here dr mr mrs
can be representated as (d|m) )(r(?!a)s?) which means first word can be eitther d or m and second word is r not folowed by a and there could be s
| = or ? 0 or 1 time .
when ever you keep anything in parenthesis its value is captured
as (d|m) is first value it value is captured and can be replace $1
second captured value is r(?!a)s?) and can replaced by $2
u$1 --means make your captured first value upper case .
L$2 means make your captured second value lower case.
I have tried my best to explain. Please let me know if something is not clear or something appears wrong
's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E
here dr mr mrs
can be representated as (d|m) )(r(?!a)s?) which means first word can be eitther d or m and second word is r not folowed by a and there could be s
| = or ? 0 or 1 time .
when ever you keep anything in parenthesis its value is captured
as (d|m) is first value it value is captured and can be replace $1
second captured value is r(?!a)s?) and can replaced by $2
u$1 --means make your captured first value upper case .
L$2 means make your captured second value lower case.
I have tried my best to explain. Please let me know if something is not clear or something appears wrong
I'm still trying to learn regular expressions but, like you, I often find the terminology misleading.
Regardless, in this case, \u$1 simply means to upcase the one character (D or M) that was identified in the first capture. that falls under the description "case-folding prefixes". However, since it was only one character, it would work with either \u$1 or \U$1, since the upper case U means to span over the entire capture.
The span over entire capture is called in the other call, namely \L$2, which insures that both the "r" and, if it exists, the "s" are shown in lower case. The \E ends the string that is being searched (thus the term span used in the terminology but, in this case, wasn't needed as the "s" defines the end of the search already.
It gets more confusing, though (to me at least), when you consider the first part of the string, as it defines not only what is captured, but also what will cause the string to be changed. What I was most confused about, in this case, is why Ken chose to exclude cases where the character "a" came before the "s". I, for the life of me, don't see why that was relevant.
Art, CEO, AnalystFinder.com
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.