BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Maplefin
Fluorite | Level 6

Hi,I'm recently learning perl regular expression. I wonder how to change the case of single character in strings.I have read SASHELP document, knowing \u,\U,\l,\L can be used to change the case.The example in SASHELP confused me:

data _null_;
   x = 'MCLAUREN';
   x = prxchange("s/(MC)/\u\L$1/i", -1, x);
   put x=;
run;
SAS writes the following output to the log:
x=McLAUREN

what's the rules under this?Such as "ABC", I want to switch this string into "aBc".How to complete the right perl regular expression?I tried to imitate the code:

data _null_;
x="ABC";
y=prxchange("s/(abc)/\l\u\l$1/i",-1,x);
put x= y=;
run;

 but got the wrong result:

x=ABC y=aBC

So,how to use perl regular expression to get the results I want? In another situation, I want to change "ADAM" into "ADaM".

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Maplefin and welcome to the SAS Support Communities!

 

The rule in your first example is: If "MC" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with the lowercase (\L)  version of it, but with the first character in uppercase (\u), i.e., replace it with "Mc".

 

The rule in your second example is: If "abc" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with itself, but with the first character in "lowercase -- no, uppercase! -- no, lowercase!!" (\l\u\l), i.e., replace the first character with "a" and leave the other two characters unchanged. Of course, the replacement expression \l\u\l$1 can be simplified to \l$1. To obtain the constant result "aBc" I would rather specify the replacement expression explicitly. The same applies to your example "ADaM" where the intention is to correct the possibly misspelled CDISC abbreviation for "Analysis Data Model."

data _null_;
length x y $10;
input x;
y=prxchange("s/\bADAM\b/ADaM/i",1,x);
put (x y)($10.);
cards;
ADAM
Adam
adam
aDaM
Madam
Adamsky
;

(\b stands for word boundary).

 

Result:

ADAM      ADaM
Adam      ADaM
adam      ADaM
aDaM      ADaM
Madam     Madam
Adamsky   Adamsky

 

The metacharacters \u, \U, \l and \L are particularly useful if there's a general rule like converting names to proper case with special attention to names starting with "Mc":

data _null_;
length x y $15;
input x;
y=prxchange("s/(MC)?([a-z])([a-z]*)/\u\L$1\E\u$2\L$3/i",1,x);
put (x y)($15.);
cards;
MCLAUREN
mcmaster
McCOY
mCpHeRson
Mcx
SMITH
doe
Adamczyk
;

Rule: Replace "MC" (case-insentitive), if any (metacharacter "?") and if these are the first letters in x, with "Mc" (\u\L$1\E), the following letter with its uppercase version (\u$2) and write the rest in lowercase (\L$3).

 

Result:

MCLAUREN       McLauren
mcmaster       McMaster
McCOY          McCoy
mCpHeRson      McPherson
Mcx            McX
SMITH          Smith
doe            Doe
Adamczyk       Adamczyk

Edit:

The same result can be obtained with a simpler expression:

y=prxchange("s/(MC)?([a-z]+)/\u\L$1\E\u\L$2/i",1,x);

Note that the impact of the leading "\u" in the replacement expression is not limited to the "MC" part: If "MC" is not found (like in "SMITH", "doe", etc.), $1 will be empty and the character to be written in uppercase will be the first character of the remaining expression \u$2\L$3 (or \u\L$2 in the simplified version), which is luckily intended to be an uppercase letter anyway. However, if the remaining expression was \L$2$3 (or \L$2) the leading \u would override the \L!

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hi @Maplefin and welcome to the SAS Support Communities!

 

The rule in your first example is: If "MC" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with the lowercase (\L)  version of it, but with the first character in uppercase (\u), i.e., replace it with "Mc".

 

The rule in your second example is: If "abc" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with itself, but with the first character in "lowercase -- no, uppercase! -- no, lowercase!!" (\l\u\l), i.e., replace the first character with "a" and leave the other two characters unchanged. Of course, the replacement expression \l\u\l$1 can be simplified to \l$1. To obtain the constant result "aBc" I would rather specify the replacement expression explicitly. The same applies to your example "ADaM" where the intention is to correct the possibly misspelled CDISC abbreviation for "Analysis Data Model."

data _null_;
length x y $10;
input x;
y=prxchange("s/\bADAM\b/ADaM/i",1,x);
put (x y)($10.);
cards;
ADAM
Adam
adam
aDaM
Madam
Adamsky
;

(\b stands for word boundary).

 

Result:

ADAM      ADaM
Adam      ADaM
adam      ADaM
aDaM      ADaM
Madam     Madam
Adamsky   Adamsky

 

The metacharacters \u, \U, \l and \L are particularly useful if there's a general rule like converting names to proper case with special attention to names starting with "Mc":

data _null_;
length x y $15;
input x;
y=prxchange("s/(MC)?([a-z])([a-z]*)/\u\L$1\E\u$2\L$3/i",1,x);
put (x y)($15.);
cards;
MCLAUREN
mcmaster
McCOY
mCpHeRson
Mcx
SMITH
doe
Adamczyk
;

Rule: Replace "MC" (case-insentitive), if any (metacharacter "?") and if these are the first letters in x, with "Mc" (\u\L$1\E), the following letter with its uppercase version (\u$2) and write the rest in lowercase (\L$3).

 

Result:

MCLAUREN       McLauren
mcmaster       McMaster
McCOY          McCoy
mCpHeRson      McPherson
Mcx            McX
SMITH          Smith
doe            Doe
Adamczyk       Adamczyk

Edit:

The same result can be obtained with a simpler expression:

y=prxchange("s/(MC)?([a-z]+)/\u\L$1\E\u\L$2/i",1,x);

Note that the impact of the leading "\u" in the replacement expression is not limited to the "MC" part: If "MC" is not found (like in "SMITH", "doe", etc.), $1 will be empty and the character to be written in uppercase will be the first character of the remaining expression \u$2\L$3 (or \u\L$2 in the simplified version), which is luckily intended to be an uppercase letter anyway. However, if the remaining expression was \L$2$3 (or \L$2) the leading \u would override the \L!

Maplefin
Fluorite | Level 6

Thanks a lot! It will be helpful.

Ksharp
Super User
data _null_;
x="ABC ";
y=prxchange("s/(a)(b)(c)/\l$1\u$2\l$3/i",-1,x);
put x= y=;

x="ADAM";
y=prxchange("s/(ad)(a)(m)/\u$1\l$2\u$3/i",-1,x);
put x= y=;

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1838 views
  • 5 likes
  • 3 in conversation