BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Dear all,

 

I expect to do some punctuation cleaning.

 

1. Names beginning or ending with a double quotation mark( i.e., """ %" or "% """) should not contain a space after the beginning quotation mark or before the ending quotation mark respectively.

 

2. Names that have a double quotation mark at the beginning and the end, and that do not contain any other double quotation mark(i.e.,

“””%””” and not “””%””%”””), should have quotation marks removed.

 

3. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the beginning of a name that are not relevant should be removed.

 

4. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”)  at the end of a name that are not relevant should be removed.

 

Could you please give me some suggestions?

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

or regular expressions.

data HAVE;
  STR='""  a "n "" '; output;
  STR='"  a  n  "  '; output;
  STR='""  a  n "" '; output;
  STR='~~1~~       '; output;
  STR='#~1~#       '; output;
run;
data WANT;
  set HAVE;
 *2; STR=prxchange('s/\A""([^"]*?)""\Z/"$1"/',-1,trim(STR));
 *1; STR=prxchange('s/\A"( *)/"/',-1,trim(STR)); 
 *1; STR=prxchange('s/( *)"\Z/"/',-1,trim(STR));
 *34;STR=prxchange('s/\A[^a-zA-Z0-9"\@()#!* ]*(.*?)[^a-zA-Z0-9"\@()#!* ]*\Z/$1/',-1,trim(STR)); 
  put STR=;
run;

STR="" a "n ""
STR="a n"
STR="a n"
STR=1
STR=#~1~#

 

View solution in original post

6 REPLIES 6
Reeza
Super User
  • CHAR() to get specific characters to check for space or quotation marks
  • SCAN() to get information between quotes
  • NOTDIGIT(), NOTALPHA(), NOTALPHANUM() to get types of characters/numbers

 

 

https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=lefunctionsref&docsetT...

 

If you're feeling adventurous, learning pearl regular expressions is likely your best overall solution for manipulating text data.

 

You can put some of your test data here and test your regular expressions. I find this helpful for building mine. 

https://regexr.com

 


@Alexxxxxxx wrote:

Dear all,

 

I expect to do some punctuation cleaning.

 

1. Names beginning or ending with a double quotation mark( i.e., """ %" or "% """) should not contain a space after the beginning quotation mark or before the ending quotation mark respectively.

 

2. Names that have a double quotation mark at the beginning and the end, and that do not contain any other double quotation mark(i.e.,

“””%””” and not “””%””%”””), should have quotation marks removed.

 

3. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”) at the beginning of a name that are not relevant should be removed.

 

4. Non-alphanumerical characters(i.e., characters except A-Z; 0-9; “””; “@”; “(“; “’”; “#”; “!”; “*”; “/”)  at the end of a name that are not relevant should be removed.

 

Could you please give me some suggestions?

thanks in advance.


 

Alexxxxxxx
Pyrite | Level 9
Dear Reeza,

thanks for sharing.

do you know where I can learn the meaning of strings like 's/(\w+), (\w+)/$2 $1' which added in prxchange()?
Reeza
Super User
The second link I sent has that type of information - this is perl regular expressions.
ChrisNZ
Tourmaline | Level 20

or regular expressions.

data HAVE;
  STR='""  a "n "" '; output;
  STR='"  a  n  "  '; output;
  STR='""  a  n "" '; output;
  STR='~~1~~       '; output;
  STR='#~1~#       '; output;
run;
data WANT;
  set HAVE;
 *2; STR=prxchange('s/\A""([^"]*?)""\Z/"$1"/',-1,trim(STR));
 *1; STR=prxchange('s/\A"( *)/"/',-1,trim(STR)); 
 *1; STR=prxchange('s/( *)"\Z/"/',-1,trim(STR));
 *34;STR=prxchange('s/\A[^a-zA-Z0-9"\@()#!* ]*(.*?)[^a-zA-Z0-9"\@()#!* ]*\Z/$1/',-1,trim(STR)); 
  put STR=;
run;

STR="" a "n ""
STR="a n"
STR="a n"
STR=1
STR=#~1~#

 

Alexxxxxxx
Pyrite | Level 9
Dear @ChriNZ,

thanks for your sharing. Could you please explain the meaning of code in prxchange()?
Besdies, Could you please let me know where I can learn this code?
ChrisNZ
Tourmaline | Level 20

There are heaps of tutorials online for pearl regular expressions.

And validations sites too, like like one: https://regex101.com/  whichi I use. 

I will explain how your expression is parsing your text.

My book has a chapter with a dictionary of all SAS-supported expressions (with short examples but no tutorial).

For example my first expression (I removed the ? as it's not needed)

\A""([^"]*)""\Z

 reads as: 

\A  start of string

""    2 quotes

()    a group

[^"] anything but double quotes

*     match as many as possible 

  so all 3 together mean:

([^"]*?) capture all the non-quote characters you can and put them in a group

""    2 quotes

\Z   end of string

So the whole expression reads: match a word starting with 2 quotes, then no quotes,then ending with 2 quotes.

 

The second part 

"$1"

 replaces everything matched with: a quote then the capture group then a quote.

I could have written the expression

STR=prxchange('s/\A"("[^"]*")"\Z/$1/',-1,trim(STR));

to the same effect

The very first letter can be m (often omitted) for matching or s for substituting.

It's a steep learning curve, but I taught myself so you can too. 🙂

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1440 views
  • 4 likes
  • 3 in conversation