Help using Base SAS procedures

Case-Folding span?

Accepted Solution Solved
Reply
Super Contributor
Posts: 297
Accepted Solution

Case-Folding span?

[ Edited ]

Hello:

 

I don't understand the explaination for the code of \u$1\L$2\E.  Could someone help?

 

http://analytics.ncsu.edu/sesug/2012/CT-03.pdf

 

The replacement expression then makes use of case-folding prefixes and spans

to control the case of the capture buffers. \u makes the next character that follows it

uppercase, which would be the d or m in first capture buffer. The case-folding span \L

makes characters that follow lowercase until the end of the replacement or until disabled by

\E, which is applied to the second capture buffer. Case-folding functionality was introduced

in SAS V9.2.

 

data guest_list ;

input attendees $30. ;

datalines;

MR and MRS DRaco Malfoy

mr and dr M Johnson

MrS. O.M. Goodness

DR. Evil

mr&mrs R. Miller

;

run ;

proc sql ;

select attendees

, prxchange( 's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E/io', -1, attendees ) as attendees2

from guest_list

;

quit ;

 


Accepted Solutions
Solution
‎06-25-2017 10:56 AM
PROC Star
Posts: 253

Re: Case-Folding span?

[ Edited ]

's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E

 

here dr mr mrs

can be representated as (d|m) )(r(?!a)s?) which means first word can be eitther d or m  and second word is r not folowed by a and there could be s

 

| = or ? 0 or 1 time .

 

when ever you keep anything in parenthesis its value is captured

as (d|m) is first value it value is captured and  can be replace $1

second captured value is r(?!a)s?)  and can replaced by $2

 u$1 --means make your  captured first value  upper case .

L$2 means make  your  captured  second value  lower case.

 

I have tried my best to explain. Please let me know if something is not clear or something appears wrong

View solution in original post


All Replies
Solution
‎06-25-2017 10:56 AM
PROC Star
Posts: 253

Re: Case-Folding span?

[ Edited ]

's/\b(d|m)(r(?!a)s?)/\u$1\L$2\E

 

here dr mr mrs

can be representated as (d|m) )(r(?!a)s?) which means first word can be eitther d or m  and second word is r not folowed by a and there could be s

 

| = or ? 0 or 1 time .

 

when ever you keep anything in parenthesis its value is captured

as (d|m) is first value it value is captured and  can be replace $1

second captured value is r(?!a)s?)  and can replaced by $2

 u$1 --means make your  captured first value  upper case .

L$2 means make  your  captured  second value  lower case.

 

I have tried my best to explain. Please let me know if something is not clear or something appears wrong

PROC Star
Posts: 7,363

Re: Case-Folding span?

I'm still trying to learn regular expressions but, like you, I often find the terminology misleading.

 

Regardless, in this case, \u$1 simply means to upcase the one character (D or M) that was identified in the first capture. that falls under the description "case-folding prefixes". However, since it was only one character, it would work with either \u$1 or \U$1, since the upper case U means to span over the entire capture.

 

The span over entire capture is called in the other call, namely \L$2, which insures that both the "r" and, if it exists, the "s" are shown in lower case. The \E ends the string that is being searched (thus the term span used in the terminology but, in this case, wasn't needed as the "s" defines the end of the search already.

 

It gets more confusing, though (to me at least), when you consider the first part of the string, as it defines not only what is captured, but also what will cause the string to be changed. What I was most confused about, in this case, is why Ken chose to exclude cases where the character "a" came before the "s". I, for the life of me, don't see why that was relevant.

 

Art, CEO, AnalystFinder.com

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 189 views
  • 2 likes
  • 3 in conversation