BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Maplefin
Obsidian | Level 7

Hi,I'm recently learning perl regular expression. I wonder how to change the case of single character in strings.I have read SASHELP document, knowing \u,\U,\l,\L can be used to change the case.The example in SASHELP confused me:

data _null_;
   x = 'MCLAUREN';
   x = prxchange("s/(MC)/\u\L$1/i", -1, x);
   put x=;
run;
SAS writes the following output to the log:
x=McLAUREN

what's the rules under this?Such as "ABC", I want to switch this string into "aBc".How to complete the right perl regular expression?I tried to imitate the code:

data _null_;
x="ABC";
y=prxchange("s/(abc)/\l\u\l$1/i",-1,x);
put x= y=;
run;

 but got the wrong result:

x=ABC y=aBC

So,how to use perl regular expression to get the results I want? In another situation, I want to change "ADAM" into "ADaM".

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Maplefin and welcome to the SAS Support Communities!

 

The rule in your first example is: If "MC" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with the lowercase (\L)  version of it, but with the first character in uppercase (\u), i.e., replace it with "Mc".

 

The rule in your second example is: If "abc" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with itself, but with the first character in "lowercase -- no, uppercase! -- no, lowercase!!" (\l\u\l), i.e., replace the first character with "a" and leave the other two characters unchanged. Of course, the replacement expression \l\u\l$1 can be simplified to \l$1. To obtain the constant result "aBc" I would rather specify the replacement expression explicitly. The same applies to your example "ADaM" where the intention is to correct the possibly misspelled CDISC abbreviation for "Analysis Data Model."

data _null_;
length x y $10;
input x;
y=prxchange("s/\bADAM\b/ADaM/i",1,x);
put (x y)($10.);
cards;
ADAM
Adam
adam
aDaM
Madam
Adamsky
;

(\b stands for word boundary).

 

Result:

ADAM      ADaM
Adam      ADaM
adam      ADaM
aDaM      ADaM
Madam     Madam
Adamsky   Adamsky

 

The metacharacters \u, \U, \l and \L are particularly useful if there's a general rule like converting names to proper case with special attention to names starting with "Mc":

data _null_;
length x y $15;
input x;
y=prxchange("s/(MC)?([a-z])([a-z]*)/\u\L$1\E\u$2\L$3/i",1,x);
put (x y)($15.);
cards;
MCLAUREN
mcmaster
McCOY
mCpHeRson
Mcx
SMITH
doe
Adamczyk
;

Rule: Replace "MC" (case-insentitive), if any (metacharacter "?") and if these are the first letters in x, with "Mc" (\u\L$1\E), the following letter with its uppercase version (\u$2) and write the rest in lowercase (\L$3).

 

Result:

MCLAUREN       McLauren
mcmaster       McMaster
McCOY          McCoy
mCpHeRson      McPherson
Mcx            McX
SMITH          Smith
doe            Doe
Adamczyk       Adamczyk

Edit:

The same result can be obtained with a simpler expression:

y=prxchange("s/(MC)?([a-z]+)/\u\L$1\E\u\L$2/i",1,x);

Note that the impact of the leading "\u" in the replacement expression is not limited to the "MC" part: If "MC" is not found (like in "SMITH", "doe", etc.), $1 will be empty and the character to be written in uppercase will be the first character of the remaining expression \u$2\L$3 (or \u\L$2 in the simplified version), which is luckily intended to be an uppercase letter anyway. However, if the remaining expression was \L$2$3 (or \L$2) the leading \u would override the \L!

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hi @Maplefin and welcome to the SAS Support Communities!

 

The rule in your first example is: If "MC" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with the lowercase (\L)  version of it, but with the first character in uppercase (\u), i.e., replace it with "Mc".

 

The rule in your second example is: If "abc" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with itself, but with the first character in "lowercase -- no, uppercase! -- no, lowercase!!" (\l\u\l), i.e., replace the first character with "a" and leave the other two characters unchanged. Of course, the replacement expression \l\u\l$1 can be simplified to \l$1. To obtain the constant result "aBc" I would rather specify the replacement expression explicitly. The same applies to your example "ADaM" where the intention is to correct the possibly misspelled CDISC abbreviation for "Analysis Data Model."

data _null_;
length x y $10;
input x;
y=prxchange("s/\bADAM\b/ADaM/i",1,x);
put (x y)($10.);
cards;
ADAM
Adam
adam
aDaM
Madam
Adamsky
;

(\b stands for word boundary).

 

Result:

ADAM      ADaM
Adam      ADaM
adam      ADaM
aDaM      ADaM
Madam     Madam
Adamsky   Adamsky

 

The metacharacters \u, \U, \l and \L are particularly useful if there's a general rule like converting names to proper case with special attention to names starting with "Mc":

data _null_;
length x y $15;
input x;
y=prxchange("s/(MC)?([a-z])([a-z]*)/\u\L$1\E\u$2\L$3/i",1,x);
put (x y)($15.);
cards;
MCLAUREN
mcmaster
McCOY
mCpHeRson
Mcx
SMITH
doe
Adamczyk
;

Rule: Replace "MC" (case-insentitive), if any (metacharacter "?") and if these are the first letters in x, with "Mc" (\u\L$1\E), the following letter with its uppercase version (\u$2) and write the rest in lowercase (\L$3).

 

Result:

MCLAUREN       McLauren
mcmaster       McMaster
McCOY          McCoy
mCpHeRson      McPherson
Mcx            McX
SMITH          Smith
doe            Doe
Adamczyk       Adamczyk

Edit:

The same result can be obtained with a simpler expression:

y=prxchange("s/(MC)?([a-z]+)/\u\L$1\E\u\L$2/i",1,x);

Note that the impact of the leading "\u" in the replacement expression is not limited to the "MC" part: If "MC" is not found (like in "SMITH", "doe", etc.), $1 will be empty and the character to be written in uppercase will be the first character of the remaining expression \u$2\L$3 (or \u\L$2 in the simplified version), which is luckily intended to be an uppercase letter anyway. However, if the remaining expression was \L$2$3 (or \L$2) the leading \u would override the \L!

Maplefin
Obsidian | Level 7

Thanks a lot! It will be helpful.

Ksharp
Super User
data _null_;
x="ABC ";
y=prxchange("s/(a)(b)(c)/\l$1\u$2\l$3/i",-1,x);
put x= y=;

x="ADAM";
y=prxchange("s/(ad)(a)(m)/\u$1\l$2\u$3/i",-1,x);
put x= y=;

run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 3156 views
  • 5 likes
  • 3 in conversation