Solved: Case-Folding span?

ybz12003 · Posted 06-23-2017 03:43 PM

Hello:

I don't understand the explaination for the code of \u$1\L$2\E. Could someone help?

http://analytics.ncsu.edu/sesug/2012/CT-03.pdf

The replacement expression then makes use of case-folding prefixes and spans

to control the case of the capture buffers. \u makes the next character that follows it

uppercase, which would be the d or m in first capture buffer. The case-folding span \L

makes characters that follow lowercase until the end of the replacement or until disabled by

\E, which is applied to the second capture buffer. Case-folding functionality was introduced

in SAS V9.2.

data guest_list ;

input attendees $30. ;

datalines;

MR and MRS DRaco Malfoy

mr and dr M Johnson

MrS. O.M. Goodness

DR. Evil

mr&mrs R. Miller

;

run ;

proc sql ;

select attendees

, prxchange( 's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E/io', -1, attendees ) as attendees2

from guest_list

;

quit ;

kiranv_ · Posted 06-23-2017 04:11 PM

's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E

here dr mr mrs

can be representated as (d|m) )(r(?!a)s?) which means first word can be eitther d or m and second word is r not folowed by a and there could be s

| = or ? 0 or 1 time .

when ever you keep anything in parenthesis its value is captured

as (d|m) is first value it value is captured and can be replace $1

second captured value is r(?!a)s?) and can replaced by $2

u$1 --means make your captured first value upper case .

L$2 means make your captured second value lower case.

I have tried my best to explain. Please let me know if something is not clear or something appears wrong

View solution in original post

kiranv_ · Posted 06-23-2017 04:11 PM

's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E

here dr mr mrs

can be representated as (d|m) )(r(?!a)s?) which means first word can be eitther d or m and second word is r not folowed by a and there could be s

| = or ? 0 or 1 time .

when ever you keep anything in parenthesis its value is captured

as (d|m) is first value it value is captured and can be replace $1

second captured value is r(?!a)s?) and can replaced by $2

u$1 --means make your captured first value upper case .

L$2 means make your captured second value lower case.

I have tried my best to explain. Please let me know if something is not clear or something appears wrong

art297 · Posted 06-24-2017 05:04 PM

I'm still trying to learn regular expressions but, like you, I often find the terminology misleading.

Regardless, in this case, \u$1 simply means to upcase the one character (D or M) that was identified in the first capture. that falls under the description "case-folding prefixes". However, since it was only one character, it would work with either \u$1 or \U$1, since the upper case U means to span over the entire capture.

The span over entire capture is called in the other call, namely \L$2, which insures that both the "r" and, if it exists, the "s" are shown in lower case. The \E ends the string that is being searched (thus the term span used in the terminology but, in this case, wasn't needed as the "s" defines the end of the search already.

It gets more confusing, though (to me at least), when you consider the first part of the string, as it defines not only what is captured, but also what will cause the string to be changed. What I was most confused about, in this case, is why Ken chose to exclude cases where the character "a" came before the "s". I, for the life of me, don't see why that was relevant.

Art, CEO, AnalystFinder.com

Case-Folding span?

Re: Case-Folding span?

Re: Case-Folding span?

Re: Case-Folding span?

Case-Folding span?

Re: Case-Folding span?

Re: Case-Folding span?

Re: Case-Folding span?

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!