Hello,
I got my nice regex
PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2)
There is one record where DIRETI2 I has
¬01210 BRUXELLES¬¬BÉLGICA
It doesn't match because I got an accent...
Is there an elegant way to include accents?
Thanks,
In Perl itself I there is a metacharacter \p{L} which matches a alphabetical character in 'any' language, however, it is not available in SAS. Your options would be to trantab the input or to add the acceptable non-ascii characters to your regular expression:
data _null_;
input DIRETI2 $80.;
rc=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-ZáéíóúÁÉÍüÓÚâêÄîôßûÂÊÎÔÛäüöÄÜÖýµäöü'.,-]{2,40}/",DIRETI2);
put direti2= rc=;
cards;
¬01210 BRUXELLES¬¬BÉLGICA
;
run;
Character Class Groupings is what you might be looking for: SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition
Also: Make sure to mask a '.' as else is has the meaning of a wildcard.
data test;
DIRETI2='¬01210 BRUXELLES¬¬BÉLGICA';
rc1=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2);
rc2=PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);
run;
This will depend on your session encoding for whether it will work or not.
Session Encoding = latin1, will work. If you have session encoding = utf-8, it will NOT work...
I suppose our enconding is latin1.
PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);
Works fine for me. No more issues with accents.
Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.