- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to match all English and international letters like áñéúáóíÁÑçâèôïéêëààñÉ, but I don't want to match underscore and whatever else regex consider to be part of a "word."
In the following article, I tried \p{L} but it doesn't work in SAS 9.2
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
2965 data _null_;
2966 infile cards dsd dlm=':';
2967 input x $ name:$26.;
2968 valid = not lengthn(compress(name,' .-','A'));
2969 put (_all_)(=);
2970 cards;
x=Good name=Martha Jones-Smith valid=1
x=Invalid name=Martha Jones=Smith valid=0
x=Good name=Robert Smith Jr. valid=1
x=Invalid name=Robert Smith Jr. (Bob) valid=0
x=Invalid name=Robert Smith Jr, valid=0
x=Invalid name=0áñéúáóíÁÑçâèôïéêëààñÉ valid=0
x=Good name=áñéúáóíÁÑçâèôïéêëààñÉ valid=1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Would ANYALPHA help.
2818 data _null_;
2819 a='0áñéúáóíÁÑçâèôïéêëààñÉ';
2820 b=anyalpha(a);
2821 put _all_;
2822 run;
a=0áñéúáóíÁÑçâèôïéêëààñÉ b=2 _ERROR_=0 _N_=1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am not sure from the way you ask you question, but I this you want to remove all non-alpha characters from a string that contains both english and non-english (such as French, Spanish, or German, i.e. non DBCS languages). \p is not a valid metacharacter for SAS, even though it is for perl. There is nothing directly equivalent that comes to mind however I believe [[:alpha:]] with work for your needs. This could also be accomplished using compress.
13 data _null_;
14 a='0áñéúáóí-ÁÑçâè _ôïéêëààñÉ';
15 b=prxchange('s/[[:^alpha:]]//o',-1,a);
16 c=compress(a,,'ka');
17 put (a--c) (=/);
18 run;
a=0áñéúáóí-ÁÑçâè _ôïéêëààñÉ
b=áñéúáóíÁÑçâèôïéêëààñÉ
c=áñéúáóíÁÑçâèôïéêëààñÉ
If you do care about DBCS languages look into KCOMPRESS
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the replies, FriedEgg and Ksharp, but I don't want to change any characters. I'm validating a person's name contains only valid characters (explained more in post #4).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the reply data_null_. ANYALPHA is interesting but not helpful. To be more specific, I want to make sure a person's name (as it is recorded) contains only valid characters which are alphabetic characters (including international characters), space, period, and hyphen. For example
Good: Martha Jones-Smith
Invalid: Martha Jones=Smith
Good: Robert Smith Jr.
Invalid: Robert Smith Jr. (Bob)
Invalid: Robert Smith Jr,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
2965 data _null_;
2966 infile cards dsd dlm=':';
2967 input x $ name:$26.;
2968 valid = not lengthn(compress(name,' .-','A'));
2969 put (_all_)(=);
2970 cards;
x=Good name=Martha Jones-Smith valid=1
x=Invalid name=Martha Jones=Smith valid=0
x=Good name=Robert Smith Jr. valid=1
x=Invalid name=Robert Smith Jr. (Bob) valid=0
x=Invalid name=Robert Smith Jr, valid=0
x=Invalid name=0áñéúáóíÁÑçâèôïéêëààñÉ valid=0
x=Good name=áñéúáóíÁÑçâèôïéêëààñÉ valid=1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would like to use KSUBSTR() to pull out a single character each time and compare it with the values you don't want. Some Dummy code like:
i=1;
_temp=ksubstr(name,i,1);
do while(not missing(_temp));
if _temp not in ('_' '0' '1' 'a' 'b') then put 'Found:' _temp;
i+1;
_temp=ksubstr(name,i,1);
end;
But I think it is just beginning, This problem is very annoying for me for a long time .
Ksharp