Hi,I'm recently learning perl regular expression. I wonder how to change the case of single character in strings.I have read SASHELP document, knowing \u,\U,\l,\L can be used to change the case.The example in SASHELP confused me:
data _null_;
x = 'MCLAUREN';
x = prxchange("s/(MC)/\u\L$1/i", -1, x);
put x=;
run;x=McLAUREN
what's the rules under this?Such as "ABC", I want to switch this string into "aBc".How to complete the right perl regular expression?I tried to imitate the code:
data _null_;
x="ABC";
y=prxchange("s/(abc)/\l\u\l$1/i",-1,x);
put x= y=;
run;but got the wrong result:
x=ABC y=aBC
So,how to use perl regular expression to get the results I want? In another situation, I want to change "ADAM" into "ADaM".
Hi @Maplefin and welcome to the SAS Support Communities!
The rule in your first example is: If "MC" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with the lowercase (\L) version of it, but with the first character in uppercase (\u), i.e., replace it with "Mc".
The rule in your second example is: If "abc" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with itself, but with the first character in "lowercase -- no, uppercase! -- no, lowercase!!" (\l\u\l), i.e., replace the first character with "a" and leave the other two characters unchanged. Of course, the replacement expression \l\u\l$1 can be simplified to \l$1. To obtain the constant result "aBc" I would rather specify the replacement expression explicitly. The same applies to your example "ADaM" where the intention is to correct the possibly misspelled CDISC abbreviation for "Analysis Data Model."
data _null_;
length x y $10;
input x;
y=prxchange("s/\bADAM\b/ADaM/i",1,x);
put (x y)($10.);
cards;
ADAM
Adam
adam
aDaM
Madam
Adamsky
;
(\b stands for word boundary).
Result:
ADAM ADaM Adam ADaM adam ADaM aDaM ADaM Madam Madam Adamsky Adamsky
The metacharacters \u, \U, \l and \L are particularly useful if there's a general rule like converting names to proper case with special attention to names starting with "Mc":
data _null_;
length x y $15;
input x;
y=prxchange("s/(MC)?([a-z])([a-z]*)/\u\L$1\E\u$2\L$3/i",1,x);
put (x y)($15.);
cards;
MCLAUREN
mcmaster
McCOY
mCpHeRson
Mcx
SMITH
doe
Adamczyk
;
Rule: Replace "MC" (case-insentitive), if any (metacharacter "?") and if these are the first letters in x, with "Mc" (\u\L$1\E), the following letter with its uppercase version (\u$2) and write the rest in lowercase (\L$3).
Result:
MCLAUREN McLauren mcmaster McMaster McCOY McCoy mCpHeRson McPherson Mcx McX SMITH Smith doe Doe Adamczyk Adamczyk
Edit:
The same result can be obtained with a simpler expression:
y=prxchange("s/(MC)?([a-z]+)/\u\L$1\E\u\L$2/i",1,x);
Note that the impact of the leading "\u" in the replacement expression is not limited to the "MC" part: If "MC" is not found (like in "SMITH", "doe", etc.), $1 will be empty and the character to be written in uppercase will be the first character of the remaining expression \u$2\L$3 (or \u\L$2 in the simplified version), which is luckily intended to be an uppercase letter anyway. However, if the remaining expression was \L$2$3 (or \L$2) the leading \u would override the \L!
Hi @Maplefin and welcome to the SAS Support Communities!
The rule in your first example is: If "MC" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with the lowercase (\L) version of it, but with the first character in uppercase (\u), i.e., replace it with "Mc".
The rule in your second example is: If "abc" (case-insensitive due to "/i") is found (anywhere due to "-1") in x, replace it with itself, but with the first character in "lowercase -- no, uppercase! -- no, lowercase!!" (\l\u\l), i.e., replace the first character with "a" and leave the other two characters unchanged. Of course, the replacement expression \l\u\l$1 can be simplified to \l$1. To obtain the constant result "aBc" I would rather specify the replacement expression explicitly. The same applies to your example "ADaM" where the intention is to correct the possibly misspelled CDISC abbreviation for "Analysis Data Model."
data _null_;
length x y $10;
input x;
y=prxchange("s/\bADAM\b/ADaM/i",1,x);
put (x y)($10.);
cards;
ADAM
Adam
adam
aDaM
Madam
Adamsky
;
(\b stands for word boundary).
Result:
ADAM ADaM Adam ADaM adam ADaM aDaM ADaM Madam Madam Adamsky Adamsky
The metacharacters \u, \U, \l and \L are particularly useful if there's a general rule like converting names to proper case with special attention to names starting with "Mc":
data _null_;
length x y $15;
input x;
y=prxchange("s/(MC)?([a-z])([a-z]*)/\u\L$1\E\u$2\L$3/i",1,x);
put (x y)($15.);
cards;
MCLAUREN
mcmaster
McCOY
mCpHeRson
Mcx
SMITH
doe
Adamczyk
;
Rule: Replace "MC" (case-insentitive), if any (metacharacter "?") and if these are the first letters in x, with "Mc" (\u\L$1\E), the following letter with its uppercase version (\u$2) and write the rest in lowercase (\L$3).
Result:
MCLAUREN McLauren mcmaster McMaster McCOY McCoy mCpHeRson McPherson Mcx McX SMITH Smith doe Doe Adamczyk Adamczyk
Edit:
The same result can be obtained with a simpler expression:
y=prxchange("s/(MC)?([a-z]+)/\u\L$1\E\u\L$2/i",1,x);
Note that the impact of the leading "\u" in the replacement expression is not limited to the "MC" part: If "MC" is not found (like in "SMITH", "doe", etc.), $1 will be empty and the character to be written in uppercase will be the first character of the remaining expression \u$2\L$3 (or \u\L$2 in the simplified version), which is luckily intended to be an uppercase letter anyway. However, if the remaining expression was \L$2$3 (or \L$2) the leading \u would override the \L!
Thanks a lot! It will be helpful.
data _null_;
x="ABC ";
y=prxchange("s/(a)(b)(c)/\l$1\u$2\l$3/i",-1,x);
put x= y=;
x="ADAM";
y=prxchange("s/(ad)(a)(m)/\u$1\l$2\u$3/i",-1,x);
put x= y=;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.