Dear all,
I expect to do some punctuation cleaning.
1. Names beginning or ending with a double quotation mark( i.e., """ %" or "% """) should not contain a space after the beginning quotation mark or before the ending quotation mark respectively.
2. Names that have a double quotation mark at the beginning and the end, and that do not contain any other double quotation mark(i.e.,
“””%””” and not “””%””%”””), should have quotation marks removed.
3. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the beginning of a name that are not relevant should be removed.
4. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the end of a name that are not relevant should be removed.
Could you please give me some suggestions?
thanks in advance.
or regular expressions.
data HAVE;
STR='"" a "n "" '; output;
STR='" a n " '; output;
STR='"" a n "" '; output;
STR='~~1~~ '; output;
STR='#~1~# '; output;
run;
data WANT;
set HAVE;
*2; STR=prxchange('s/\A""([^"]*?)""\Z/"$1"/',-1,trim(STR));
*1; STR=prxchange('s/\A"( *)/"/',-1,trim(STR));
*1; STR=prxchange('s/( *)"\Z/"/',-1,trim(STR));
*34;STR=prxchange('s/\A[^a-zA-Z0-9"\@()#!* ]*(.*?)[^a-zA-Z0-9"\@()#!* ]*\Z/$1/',-1,trim(STR));
put STR=;
run;
STR="" a "n ""
STR="a n"
STR="a n"
STR=1
STR=#~1~#
If you're feeling adventurous, learning pearl regular expressions is likely your best overall solution for manipulating text data.
You can put some of your test data here and test your regular expressions. I find this helpful for building mine.
@Alexxxxxxx wrote:
Dear all,
I expect to do some punctuation cleaning.
1. Names beginning or ending with a double quotation mark( i.e., """ %" or "% """) should not contain a space after the beginning quotation mark or before the ending quotation mark respectively.
2. Names that have a double quotation mark at the beginning and the end, and that do not contain any other double quotation mark(i.e.,
“””%””” and not “””%””%”””), should have quotation marks removed.
3. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the beginning of a name that are not relevant should be removed.
4. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the end of a name that are not relevant should be removed.
Could you please give me some suggestions?
thanks in advance.
or regular expressions.
data HAVE;
STR='"" a "n "" '; output;
STR='" a n " '; output;
STR='"" a n "" '; output;
STR='~~1~~ '; output;
STR='#~1~# '; output;
run;
data WANT;
set HAVE;
*2; STR=prxchange('s/\A""([^"]*?)""\Z/"$1"/',-1,trim(STR));
*1; STR=prxchange('s/\A"( *)/"/',-1,trim(STR));
*1; STR=prxchange('s/( *)"\Z/"/',-1,trim(STR));
*34;STR=prxchange('s/\A[^a-zA-Z0-9"\@()#!* ]*(.*?)[^a-zA-Z0-9"\@()#!* ]*\Z/$1/',-1,trim(STR));
put STR=;
run;
STR="" a "n ""
STR="a n"
STR="a n"
STR=1
STR=#~1~#
There are heaps of tutorials online for pearl regular expressions.
And validations sites too, like like one: https://regex101.com/ whichi I use.
I will explain how your expression is parsing your text.
My book has a chapter with a dictionary of all SAS-supported expressions (with short examples but no tutorial).
For example my first expression (I removed the ? as it's not needed)
\A""([^"]*)""\Z
reads as:
\A start of string
"" 2 quotes
() a group
[^"] anything but double quotes
* match as many as possible
so all 3 together mean:
([^"]*?)
capture all the non-quote characters you can and put them in a group
"" 2 quotes
\Z end of string
So the whole expression reads: match a word starting with 2 quotes, then no quotes,then ending with 2 quotes.
The second part
"$1"
replaces everything matched with: a quote then the capture group then a quote.
I could have written the expression
STR=prxchange('s/\A"("[^"]*")"\Z/$1/',-1,trim(STR));
to the same effect
The very first letter can be m (often omitted) for matching or s for substituting.
It's a steep learning curve, but I taught myself so you can too. 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.