Re: Pearl question

car · Posted 11-10-2014 02:36 PM

Hello,
I got my nice regex

PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2)

There is one record where DIRETI2 I has

¬01210 BRUXELLES¬¬BÉLGICA

It doesn't match because I got an accent...

Is there an elegant way to include accents?

Thanks,

FriedEgg · Posted 11-10-2014 03:30 PM

In Perl itself I there is a metacharacter \p{L} which matches a alphabetical character in 'any' language, however, it is not available in SAS. Your options would be to trantab the input or to add the acceptable non-ascii characters to your regular expression:

data _null_;

input DIRETI2 $80.;

rc=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-ZáéíóúÁÉÍüÓÚâêÄîôßûÂÊÎÔÛäüöÄÜÖýµäöü'.,-]{2,40}/",DIRETI2);

put direti2= rc=;

cards;

¬01210 BRUXELLES¬¬BÉLGICA

;

run;

Patrick · Posted 11-10-2014 03:30 PM

Character Class Groupings is what you might be looking for: SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Also: Make sure to mask a '.' as else is has the meaning of a wildcard.

data test;

DIRETI2='¬01210 BRUXELLES¬¬BÉLGICA';

rc1=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2);

rc2=PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);

run;

FriedEgg · Posted 11-10-2014 03:39 PM

This will depend on your session encoding for whether it will work or not.

Session Encoding = latin1, will work. If you have session encoding = utf-8, it will NOT work...

car · Posted 11-11-2014 01:45 PM

I suppose our enconding is latin1.

PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);

Works fine for me. No more issues with accents.

Thank you!

Pearl question