DATA Step, Macro, Functions and more

PRXMatch question

Reply
Frequent Contributor
Frequent Contributor
Posts: 83

PRXMatch question

[ Edited ]

Hello,

 

I recently posted a question about phone number validation, etc, and was referred to this article: https://heuristically.wordpress.com/2012/10/30/phone-number-validation-in-sas/

 

It's a great article and the program works fine, however, when i look through my results I'm not getting the flags I would expect based on how that portion of the program is written.

 

/* A=area code not in service */
    if substr(&sm_phone_number, 1, 3) not in (&mv_npa) then &sm_exception = 'A'; 
 
    /* R=repeating number like 5555555555 is a probable fake */
    if prxmatch('/^([0-9])(\1{9})$/', strip(&sm_phone_number)) eq 1 then &sm_exception = 'R'; 
 
    /* I=Skype and Google GMail phone numbers do not allow inbound calls */
    if &sm_phone_number in ('2025808200', '7607058888') then &sm_exception = 'I'; 
 
    /* D=directory assistance <https://en.wikipedia.org/wiki/555-1212> */
    if prxmatch('/^[0-9]{3}5551212$/', trim(&sm_phone_number)) eq 1 then &sm_exception = 'D'; 
 
    /* F=numbers specifically reserved for fictional use are "555-0100" through "555-0199" */
    if prxmatch('/^[2-9][0-8][0-9]55501[0-9]{2}$/', strip(&sm_phone_number)) eq 1 then &sm_exception = 'F'; /* fake */
 
    /* 1=the last two digits of NXX cannot both be 1, to avoid confusion with the N11 
     * codes (http://en.wikipedia.org/wiki/North_American_Numbering_Plan) 
     * Only non-geographic area codes, such as toll-free 800/888/877/866/855 numbers 
     * and 900 numbers may use N11 as the telephone exchange prefix, since 
     * the area code must always be dialed for these numbers. 
     * <https://en.wikipedia.org/wiki/N11_code> */
    if (prxmatch('/^(800|888|877|866|855|900)/', strip(&sm_phone_number)) ne 1) and
        (prxmatch('/^[2-9][0-8][0-9][2-9]11[0-9]{4}\b/', strip(&sm_phone_number)) eq 1) then &sm_exception = '1';
 
    /* S=basic NANP syntax */
    if prxmatch('/^\(?[2-9][0-8][0-9]\)? ?[2-9][0-9]{2}-?[0-9]{4}\b/', strip(&sm_phone_number)) ne 1 then &sm_exception = 'S'; 

 

For example, all phone numbers in my dataset with sequential numbers like '000000000' or '9999999998', etc, are being flagged as "S" instead of "R" and I can't figure out why. I've been reading all the literature online regarding PRXMatch but it's not clicking for me. 

 

A thought just occured to me, could this be because my phone number fields are all text?

 

I appreciate any insight into this. Thanks!

Respected Advisor
Posts: 3,156

Re: PRXMatch question

Please explain the purpose of your code, such as what is S, A, R? Ideally provide some sample data, expected outcome and the outcome you get for now.
Trusted Advisor
Posts: 1,117

Re: PRXMatch question

Hello @Ody,

 

You have a series of IF-THEN statements without ELSE statements. So, if one "phone number" satisfies more than one IF condition, it will receive the flag of the last condition met. (The flag is overwritten each time a match is detected.) Hence, every number satisfying the 'S' condition (the last in the list) will get the 'S' flag, regardless of other flags it may have temporarily received before, when the other conditions were checked.

 

Your concrete examples, '000000000' or '9999999998' apparently must have satisfied the 'S' condition. In fact, they do not meet the 'R' condition (but they would receive the 'S' flag nevertheless even if they did): '000000000' has only 9 digits, but 10 are necessary to get the 'R' flag. '9999999998' does not consist of 10 repeated digits, it has nine 9s and one 8. Even if it had ten 9s and one 8 or eleven 9s (and no 8) it would not meet the 'R' criterion, because the regular expression used there requires exactly 10 repeated digits in a row, so it is fairly restrictive.

 

It is fine that your phone numbers are stored as text. Otherwise, the application of the character functions SUBSTR, STRIP, etc. to them would enforce automatic numeric-to-character conversions.

Frequent Contributor
Frequent Contributor
Posts: 83

Re: PRXMatch question

Posted in reply to FreelanceReinhard
@FreelanceReinhard

Makes sense. I thought that might be the case so I commented out the last condition and that defaulted the flags on all the mostly sequential numbers to "A" which matches that condition.
Respected Advisor
Posts: 4,920

Re: PRXMatch question

'000000000' doesn't match '/^\(?[2-9][0-8][0-9]\)? ?[2-9][0-9]{2}-?[0-9]{4}\b/' and that test is performed last, so it sets the flag to 'S'.

If you want the flag from the first positive test, you should use a if - else if - else if - else construct.

PG
Frequent Contributor
Frequent Contributor
Posts: 83

Re: PRXMatch question

@PGStats

Your suggestion also makes sense. I think I'll add in the 'if/else if" qualifiers and play with it until I get the results I'm looking for.

The longer I stare at the PRXMatch syntax the more it starts making sense.
Ask a Question
Discussion stats
  • 5 replies
  • 307 views
  • 1 like
  • 4 in conversation