SAS Programming

AndrewZ · Posted 05-08-2012 06:50 PM

I want to match all English and international letters like áñéúáóíÁÑçâèôïéêëààñÉ, but I don't want to match underscore and whatever else regex consider to be part of a "word."

In the following article, I tried \p{L} but it doesn't work in SAS 9.2

regex - How to match the international alphabet (English a-z, + non English) with a regular expressi...

data_null__ · Posted 05-09-2012 12:12 PM

2965 data _null_;

2966 infile cards dsd dlm=':';

2967 input x $ name:$26.;

2968 valid = not lengthn(compress(name,' .-','A'));

2969 put (_all_)(=);

2970 cards;

x=Good name=Martha Jones-Smith valid=1

x=Invalid name=Martha Jones=Smith valid=0

x=Good name=Robert Smith Jr. valid=1

x=Invalid name=Robert Smith Jr. (Bob) valid=0

x=Invalid name=Robert Smith Jr, valid=0

x=Invalid name=0áñéúáóíÁÑçâèôïéêëààñÉ valid=0

x=Good name=áñéúáóíÁÑçâèôïéêëààñÉ valid=1

View solution in original post

data_null__ · Posted 05-08-2012 07:34 PM

Would ANYALPHA help.

2818 data _null_;

2819 a='0áñéúáóíÁÑçâèôïéêëààñÉ';

2820 b=anyalpha(a);

2821 put _all_;

2822 run;

a=0áñéúáóíÁÑçâèôïéêëààñÉ b=2 _ERROR_=0 _N_=1

FriedEgg · Posted 05-08-2012 08:01 PM

I am not sure from the way you ask you question, but I this you want to remove all non-alpha characters from a string that contains both english and non-english (such as French, Spanish, or German, i.e. non DBCS languages). \p is not a valid metacharacter for SAS, even though it is for perl. There is nothing directly equivalent that comes to mind however I believe [[:alpha:]] with work for your needs. This could also be accomplished using compress.

13 data _null_;

14 a='0áñéúáóí-ÁÑçâè _ôïéêëààñÉ';

15 b=prxchange('s/[[:^alpha:]]//o',-1,a);

16 c=compress(a,,'ka');

17 put (a--c) (=/);

18 run;

a=0áñéúáóí-ÁÑçâè _ôïéêëààñÉ

b=áñéúáóíÁÑçâèôïéêëààñÉ

c=áñéúáóíÁÑçâèôïéêëààñÉ

If you do care about DBCS languages look into KCOMPRESS

AndrewZ · Posted 05-09-2012 11:31 AM

Thank you for the replies, FriedEgg and Ksharp, but I don't want to change any characters. I'm validating a person's name contains only valid characters (explained more in post #4).

AndrewZ · Posted 05-09-2012 11:28 AM

Thank you for the reply data_null_. ANYALPHA is interesting but not helpful. To be more specific, I want to make sure a person's name (as it is recorded) contains only valid characters which are alphabetic characters (including international characters), space, period, and hyphen. For example

Good: Martha Jones-Smith

Invalid: Martha Jones=Smith

Good: Robert Smith Jr.

Invalid: Robert Smith Jr. (Bob)

Invalid: Robert Smith Jr,

data_null__ · Posted 05-09-2012 12:12 PM

2965 data _null_;

2966 infile cards dsd dlm=':';

2967 input x $ name:$26.;

2968 valid = not lengthn(compress(name,' .-','A'));

2969 put (_all_)(=);

2970 cards;

x=Good name=Martha Jones-Smith valid=1

x=Invalid name=Martha Jones=Smith valid=0

x=Good name=Robert Smith Jr. valid=1

x=Invalid name=Robert Smith Jr. (Bob) valid=0

x=Invalid name=Robert Smith Jr, valid=0

x=Invalid name=0áñéúáóíÁÑçâèôïéêëààñÉ valid=0

x=Good name=áñéúáóíÁÑçâèôïéêëààñÉ valid=1

Ksharp · Posted 05-09-2012 04:12 AM

I would like to use KSUBSTR() to pull out a single character each time and compare it with the values you don't want. Some Dummy code like:

i=1;

_temp=ksubstr(name,i,1);

do while(not missing(_temp));

if _temp not in ('_' '0' '1' 'a' 'b') then put 'Found:' _temp;

i+1;

_temp=ksubstr(name,i,1);

end;

But I think it is just beginning, This problem is very annoying for me for a long time .

Ksharp

SAS Programming

Regular expression for Unicode letters

Re: Regular expression for Unicode letters

Re: Regular expression for Unicode letters

Re: Regular expression for Unicode letters

Re: Regular expression for Unicode letters

Re: Regular expression for Unicode letters

Re: Regular expression for Unicode letters

Re: Regular expression for Unicode letters

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...