Hello,
I got my nice regex
PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2)
There is one record where DIRETI2 I has
¬01210 BRUXELLES¬¬BÉLGICA
It doesn't match because I got an accent...
Is there an elegant way to include accents?
Thanks,
In Perl itself I there is a metacharacter \p{L} which matches a alphabetical character in 'any' language, however, it is not available in SAS. Your options would be to trantab the input or to add the acceptable non-ascii characters to your regular expression:
data _null_;
input DIRETI2 $80.;
rc=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-ZáéíóúÁÉÍüÓÚâêÄîôßûÂÊÎÔÛäüöÄÜÖýµäöü'.,-]{2,40}/",DIRETI2);
put direti2= rc=;
cards;
¬01210 BRUXELLES¬¬BÉLGICA
;
run;
Character Class Groupings is what you might be looking for: SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition
Also: Make sure to mask a '.' as else is has the meaning of a wildcard.
data test;
DIRETI2='¬01210 BRUXELLES¬¬BÉLGICA';
rc1=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2);
rc2=PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);
run;
This will depend on your session encoding for whether it will work or not.
Session Encoding = latin1, will work. If you have session encoding = utf-8, it will NOT work...
I suppose our enconding is latin1.
PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);
Works fine for me. No more issues with accents.
Thank you!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.