BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Dear all,

 

I expect to do some punctuation cleaning.

 

1. Names beginning or ending with a double quotation mark( i.e., """ %" or "% """) should not contain a space after the beginning quotation mark or before the ending quotation mark respectively.

 

2. Names that have a double quotation mark at the beginning and the end, and that do not contain any other double quotation mark(i.e.,

“””%””” and not “””%””%”””), should have quotation marks removed.

 

3. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the beginning of a name that are not relevant should be removed.

 

4. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”)  at the end of a name that are not relevant should be removed.

 

Could you please give me some suggestions?

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

or regular expressions.

data HAVE;
  STR='""  a "n "" '; output;
  STR='"  a  n  "  '; output;
  STR='""  a  n "" '; output;
  STR='~~1~~       '; output;
  STR='#~1~#       '; output;
run;
data WANT;
  set HAVE;
 *2; STR=prxchange('s/\A""([^"]*?)""\Z/"$1"/',-1,trim(STR));
 *1; STR=prxchange('s/\A"( *)/"/',-1,trim(STR)); 
 *1; STR=prxchange('s/( *)"\Z/"/',-1,trim(STR));
 *34;STR=prxchange('s/\A[^a-zA-Z0-9"\@()#!* ]*(.*?)[^a-zA-Z0-9"\@()#!* ]*\Z/$1/',-1,trim(STR)); 
  put STR=;
run;

STR="" a "n ""
STR="a n"
STR="a n"
STR=1
STR=#~1~#

 

View solution in original post

6 REPLIES 6
Reeza
Super User
  • CHAR() to get specific characters to check for space or quotation marks
  • SCAN() to get information between quotes
  • NOTDIGIT(), NOTALPHA(), NOTALPHANUM() to get types of characters/numbers

 

 

https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=lefunctionsref&docsetT...

 

If you're feeling adventurous, learning pearl regular expressions is likely your best overall solution for manipulating text data.

 

You can put some of your test data here and test your regular expressions. I find this helpful for building mine. 

https://regexr.com

 


@Alexxxxxxx wrote:

Dear all,

 

I expect to do some punctuation cleaning.

 

1. Names beginning or ending with a double quotation mark( i.e., """ %" or "% """) should not contain a space after the beginning quotation mark or before the ending quotation mark respectively.

 

2. Names that have a double quotation mark at the beginning and the end, and that do not contain any other double quotation mark(i.e.,

“””%””” and not “””%””%”””), should have quotation marks removed.

 

3. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the beginning of a name that are not relevant should be removed.

 

4. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”)  at the end of a name that are not relevant should be removed.

 

Could you please give me some suggestions?

thanks in advance.


 

Alexxxxxxx
Pyrite | Level 9
Dear Reeza,

thanks for sharing.

do you know where I can learn the meaning of strings like 's/(\w+), (\w+)/$2 $1' which added in prxchange()?
Reeza
Super User
The second link I sent has that type of information - this is perl regular expressions.
ChrisNZ
Tourmaline | Level 20

or regular expressions.

data HAVE;
  STR='""  a "n "" '; output;
  STR='"  a  n  "  '; output;
  STR='""  a  n "" '; output;
  STR='~~1~~       '; output;
  STR='#~1~#       '; output;
run;
data WANT;
  set HAVE;
 *2; STR=prxchange('s/\A""([^"]*?)""\Z/"$1"/',-1,trim(STR));
 *1; STR=prxchange('s/\A"( *)/"/',-1,trim(STR)); 
 *1; STR=prxchange('s/( *)"\Z/"/',-1,trim(STR));
 *34;STR=prxchange('s/\A[^a-zA-Z0-9"\@()#!* ]*(.*?)[^a-zA-Z0-9"\@()#!* ]*\Z/$1/',-1,trim(STR)); 
  put STR=;
run;

STR="" a "n ""
STR="a n"
STR="a n"
STR=1
STR=#~1~#

 

Alexxxxxxx
Pyrite | Level 9
Dear @ChriNZ,

thanks for your sharing. Could you please explain the meaning of code in prxchange()?
Besdies, Could you please let me know where I can learn this code?
ChrisNZ
Tourmaline | Level 20

There are heaps of tutorials online for pearl regular expressions.

And validations sites too, like like one: https://regex101.com/  whichi I use. 

I will explain how your expression is parsing your text.

My book has a chapter with a dictionary of all SAS-supported expressions (with short examples but no tutorial).

For example my first expression (I removed the ? as it's not needed)

\A""([^"]*)""\Z

 reads as: 

\A  start of string

""    2 quotes

()    a group

[^"] anything but double quotes

*     match as many as possible 

  so all 3 together mean:

([^"]*?) capture all the non-quote characters you can and put them in a group

""    2 quotes

\Z   end of string

So the whole expression reads: match a word starting with 2 quotes, then no quotes,then ending with 2 quotes.

 

The second part 

"$1"

 replaces everything matched with: a quote then the capture group then a quote.

I could have written the expression

STR=prxchange('s/\A"("[^"]*")"\Z/$1/',-1,trim(STR));

to the same effect

The very first letter can be m (often omitted) for matching or s for substituting.

It's a steep learning curve, but I taught myself so you can too. 🙂

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2022 views
  • 4 likes
  • 3 in conversation