Help using Base SAS procedures

Pearl question

Reply
Occasional Contributor car
Occasional Contributor
Posts: 5

Pearl question

Hello,
I  got  my  nice  regex

PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2)

There  is one  record  where  DIRETI2  I has

¬01210 BRUXELLES¬¬BÉLGICA

It  doesn't  match because  I got an  accent...

Is there  an  elegant  way  to include accents?

Thanks,

Trusted Advisor
Posts: 1,301

Re: Pearl question

In Perl itself I there is a metacharacter \p{L} which matches a alphabetical character in 'any' language, however, it is not available in SAS.  Your options would be to trantab the input or to add the acceptable non-ascii characters to your regular expression:

data _null_;

input DIRETI2 $80.;

rc=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-ZáéíóúÁÉÍüÓÚâêÄîôßûÂÊÎÔÛäüöÄÜÖýµäöü'.,-]{2,40}/",DIRETI2);

put direti2= rc=;

cards;

¬01210 BRUXELLES¬¬BÉLGICA

;

run;

Respected Advisor
Posts: 4,173

Re: Pearl question

Character Class Groupings is what you might be looking for: SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Also: Make sure to mask a '.' as else is has the meaning of a wildcard.

data test;

  DIRETI2='¬01210 BRUXELLES¬¬BÉLGICA';

  rc1=PRXMATCH("/^¬[A-Z0-9 '.,-]{8,40}¬¬[A-Z'.,-]{2,40}/",DIRETI2);

  rc2=PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);

run;

Trusted Advisor
Posts: 1,301

Re: Pearl question

This will depend on your session encoding for whether it will work or not.

Session Encoding = latin1, will work.  If you have session encoding = utf-8, it will NOT work...

Occasional Contributor car
Occasional Contributor
Posts: 5

Re: Pearl question

I suppose  our enconding is  latin1.

PRXMATCH("/^¬[[:alnum:] '\.,-]{8,40}¬¬[[:alpha:]'\.,-]{2,40}/o",DIRETI2);

Works  fine for me.   No more issues  with accents.

Thank you!

Ask a Question
Discussion stats
  • 4 replies
  • 252 views
  • 1 like
  • 3 in conversation