Dear all,
I expect to do some punctuation cleaning.
1. Names beginning or ending with a double quotation mark( i.e., """ %" or "% """) should not contain a space after the beginning quotation mark or before the ending quotation mark respectively.
2. Names that have a double quotation mark at the beginning and the end, and that do not contain any other double quotation mark(i.e.,
“””%””” and not “””%””%”””), should have quotation marks removed.
3. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the beginning of a name that are not relevant should be removed.
4. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the end of a name that are not relevant should be removed.
Could you please give me some suggestions?
thanks in advance.
or regular expressions.
data HAVE;
STR='"" a "n "" '; output;
STR='" a n " '; output;
STR='"" a n "" '; output;
STR='~~1~~ '; output;
STR='#~1~# '; output;
run;
data WANT;
set HAVE;
*2; STR=prxchange('s/\A""([^"]*?)""\Z/"$1"/',-1,trim(STR));
*1; STR=prxchange('s/\A"( *)/"/',-1,trim(STR));
*1; STR=prxchange('s/( *)"\Z/"/',-1,trim(STR));
*34;STR=prxchange('s/\A[^a-zA-Z0-9"\@()#!* ]*(.*?)[^a-zA-Z0-9"\@()#!* ]*\Z/$1/',-1,trim(STR));
put STR=;
run;
STR="" a "n ""
STR="a n"
STR="a n"
STR=1
STR=#~1~#
If you're feeling adventurous, learning pearl regular expressions is likely your best overall solution for manipulating text data.
You can put some of your test data here and test your regular expressions. I find this helpful for building mine.
@Alexxxxxxx wrote:
Dear all,
I expect to do some punctuation cleaning.
1. Names beginning or ending with a double quotation mark( i.e., """ %" or "% """) should not contain a space after the beginning quotation mark or before the ending quotation mark respectively.
2. Names that have a double quotation mark at the beginning and the end, and that do not contain any other double quotation mark(i.e.,
“””%””” and not “””%””%”””), should have quotation marks removed.
3. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the beginning of a name that are not relevant should be removed.
4. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the end of a name that are not relevant should be removed.
Could you please give me some suggestions?
thanks in advance.
or regular expressions.
data HAVE;
STR='"" a "n "" '; output;
STR='" a n " '; output;
STR='"" a n "" '; output;
STR='~~1~~ '; output;
STR='#~1~# '; output;
run;
data WANT;
set HAVE;
*2; STR=prxchange('s/\A""([^"]*?)""\Z/"$1"/',-1,trim(STR));
*1; STR=prxchange('s/\A"( *)/"/',-1,trim(STR));
*1; STR=prxchange('s/( *)"\Z/"/',-1,trim(STR));
*34;STR=prxchange('s/\A[^a-zA-Z0-9"\@()#!* ]*(.*?)[^a-zA-Z0-9"\@()#!* ]*\Z/$1/',-1,trim(STR));
put STR=;
run;
STR="" a "n ""
STR="a n"
STR="a n"
STR=1
STR=#~1~#
There are heaps of tutorials online for pearl regular expressions.
And validations sites too, like like one: https://regex101.com/ whichi I use.
I will explain how your expression is parsing your text.
My book has a chapter with a dictionary of all SAS-supported expressions (with short examples but no tutorial).
For example my first expression (I removed the ? as it's not needed)
\A""([^"]*)""\Z
reads as:
\A start of string
"" 2 quotes
() a group
[^"] anything but double quotes
* match as many as possible
so all 3 together mean:
([^"]*?)
capture all the non-quote characters you can and put them in a group
"" 2 quotes
\Z end of string
So the whole expression reads: match a word starting with 2 quotes, then no quotes,then ending with 2 quotes.
The second part
"$1"
replaces everything matched with: a quote then the capture group then a quote.
I could have written the expression
STR=prxchange('s/\A"("[^"]*")"\Z/$1/',-1,trim(STR));
to the same effect
The very first letter can be m (often omitted) for matching or s for substituting.
It's a steep learning curve, but I taught myself so you can too. 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.