SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AndrewZ
Quartz | Level 8

I want to match all English and international letters like áñéúáóíÁÑçâèôïéêëààñÉ, but I don't want to match underscore and whatever else regex consider to be part of a "word."

In the following article, I tried \p{L} but it doesn't work in SAS 9.2

regex - How to match the international alphabet (English a-z, + non English) with a regular expressi...

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

2965  data _null_;

2966     infile cards dsd dlm=':';

2967     input x $ name:$26.;

2968     valid = not lengthn(compress(name,' .-','A'));

2969     put (_all_)(=);

2970     cards;

x=Good name=Martha Jones-Smith valid=1

x=Invalid name=Martha Jones=Smith valid=0

x=Good name=Robert Smith Jr. valid=1

x=Invalid name=Robert Smith Jr. (Bob) valid=0

x=Invalid name=Robert Smith Jr, valid=0

x=Invalid name=0áñéúáóíÁÑçâèôïéêëààñÉ valid=0

x=Good name=áñéúáóíÁÑçâèôïéêëààñÉ valid=1

View solution in original post

6 REPLIES 6
data_null__
Jade | Level 19

Would ANYALPHA help.

2818  data _null_;

2819     a='0áñéúáóíÁÑçâèôïéêëààñÉ';

2820     b=anyalpha(a);

2821     put _all_;

2822     run;

a=0áñéúáóíÁÑçâèôïéêëààñÉ b=2 _ERROR_=0 _N_=1

FriedEgg
SAS Employee

I am not sure from the way you ask you question, but I this you want to remove all non-alpha characters from a string that contains both english and non-english (such as French, Spanish, or German, i.e. non DBCS languages).  \p is not a valid metacharacter for SAS, even though it is for perl.  There is nothing directly equivalent that comes to mind however I believe [[:alpha:]] with work for your needs.  This could also be accomplished using compress.

13         data _null_;

14          a='0áñéúáóí-ÁÑçâè _ôïéêëààñÉ';

15          b=prxchange('s/[[:^alpha:]]//o',-1,a);

16          c=compress(a,,'ka');

17          put (a--c) (=/);

18         run;

a=0áñéúáóí-ÁÑçâè _ôïéêëààñÉ

b=áñéúáóíÁÑçâèôïéêëààñÉ

c=áñéúáóíÁÑçâèôïéêëààñÉ

If you do care about DBCS languages look into KCOMPRESS

AndrewZ
Quartz | Level 8

Thank you for the replies, FriedEgg and Ksharp, but I don't want to change any characters.  I'm validating a person's name contains only valid characters (explained more in post #4).

AndrewZ
Quartz | Level 8

Thank you for the reply data_null_.  ANYALPHA is interesting but not helpful.  To be more specific, I want to make sure a person's name (as it is recorded) contains only valid characters which are alphabetic characters (including international characters), space, period, and hyphen.  For example

Good: Martha Jones-Smith

Invalid: Martha Jones=Smith

Good: Robert Smith Jr.

Invalid: Robert Smith Jr. (Bob)

Invalid: Robert Smith Jr,

data_null__
Jade | Level 19

2965  data _null_;

2966     infile cards dsd dlm=':';

2967     input x $ name:$26.;

2968     valid = not lengthn(compress(name,' .-','A'));

2969     put (_all_)(=);

2970     cards;

x=Good name=Martha Jones-Smith valid=1

x=Invalid name=Martha Jones=Smith valid=0

x=Good name=Robert Smith Jr. valid=1

x=Invalid name=Robert Smith Jr. (Bob) valid=0

x=Invalid name=Robert Smith Jr, valid=0

x=Invalid name=0áñéúáóíÁÑçâèôïéêëààñÉ valid=0

x=Good name=áñéúáóíÁÑçâèôïéêëààñÉ valid=1

Ksharp
Super User

I would like to use KSUBSTR() to pull out a single character each time and compare it with the values you don't want. Some Dummy code like:

i=1;

_temp=ksubstr(name,i,1);

do while(not missing(_temp));

if  _temp  not in ('_' '0' '1' 'a' 'b') then put 'Found:'  _temp;

i+1;

_temp=ksubstr(name,i,1);

end;

But I think it is just beginning, This problem is very annoying for me for a long time .

Ksharp

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3014 views
  • 0 likes
  • 4 in conversation