Hello,
I got my nice regex
PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2)
There is one record where DIRETI2 I has
¬01210 BRUXELLES¬¬BÉLGICA
It doesn't match because I got an accent...
Is there an elegant way to include accents?
Thanks,
In Perl itself I there is a metacharacter \p{L} which matches a alphabetical character in 'any' language, however, it is not available in SAS. Your options would be to trantab the input or to add the acceptable non-ascii characters to your regular expression:
data _null_;
input DIRETI2 $80.;
rc=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-ZáéíóúÁÉÍüÓÚâêÄîôßûÂÊÎÔÛäüöÄÜÖýµäöü'.,-]{2,40}/",DIRETI2);
put direti2= rc=;
cards;
¬01210 BRUXELLES¬¬BÉLGICA
;
run;
Character Class Groupings is what you might be looking for: SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition
Also: Make sure to mask a '.' as else is has the meaning of a wildcard.
data test;
DIRETI2='¬01210 BRUXELLES¬¬BÉLGICA';
rc1=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2);
rc2=PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);
run;
This will depend on your session encoding for whether it will work or not.
Session Encoding = latin1, will work. If you have session encoding = utf-8, it will NOT work...
I suppose our enconding is latin1.
PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);
Works fine for me. No more issues with accents.
Thank you!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.