BookmarkSubscribeRSS Feed
Ody
Quartz | Level 8 Ody
Quartz | Level 8

Hello,

 

I recently posted a question about phone number validation, etc, and was referred to this article: https://heuristically.wordpress.com/2012/10/30/phone-number-validation-in-sas/

 

It's a great article and the program works fine, however, when i look through my results I'm not getting the flags I would expect based on how that portion of the program is written.

 

/* A=area code not in service */
    if substr(&sm_phone_number, 1, 3) not in (&mv_npa) then &sm_exception = 'A'; 
 
    /* R=repeating number like 5555555555 is a probable fake */
    if prxmatch('/^([0-9])(\1{9})$/', strip(&sm_phone_number)) eq 1 then &sm_exception = 'R'; 
 
    /* I=Skype and Google GMail phone numbers do not allow inbound calls */
    if &sm_phone_number in ('2025808200', '7607058888') then &sm_exception = 'I'; 
 
    /* D=directory assistance <https://en.wikipedia.org/wiki/555-1212> */
    if prxmatch('/^[0-9]{3}5551212$/', trim(&sm_phone_number)) eq 1 then &sm_exception = 'D'; 
 
    /* F=numbers specifically reserved for fictional use are "555-0100" through "555-0199" */
    if prxmatch('/^[2-9][0-8][0-9]55501[0-9]{2}$/', strip(&sm_phone_number)) eq 1 then &sm_exception = 'F'; /* fake */
 
    /* 1=the last two digits of NXX cannot both be 1, to avoid confusion with the N11 
     * codes (http://en.wikipedia.org/wiki/North_American_Numbering_Plan) 
     * Only non-geographic area codes, such as toll-free 800/888/877/866/855 numbers 
     * and 900 numbers may use N11 as the telephone exchange prefix, since 
     * the area code must always be dialed for these numbers. 
     * <https://en.wikipedia.org/wiki/N11_code> */
    if (prxmatch('/^(800|888|877|866|855|900)/', strip(&sm_phone_number)) ne 1) and
        (prxmatch('/^[2-9][0-8][0-9][2-9]11[0-9]{4}\b/', strip(&sm_phone_number)) eq 1) then &sm_exception = '1';
 
    /* S=basic NANP syntax */
    if prxmatch('/^\(?[2-9][0-8][0-9]\)? ?[2-9][0-9]{2}-?[0-9]{4}\b/', strip(&sm_phone_number)) ne 1 then &sm_exception = 'S'; 

 

For example, all phone numbers in my dataset with sequential numbers like '000000000' or '9999999998', etc, are being flagged as "S" instead of "R" and I can't figure out why. I've been reading all the literature online regarding PRXMatch but it's not clicking for me. 

 

A thought just occured to me, could this be because my phone number fields are all text?

 

I appreciate any insight into this. Thanks!

5 REPLIES 5
Haikuo
Onyx | Level 15
Please explain the purpose of your code, such as what is S, A, R? Ideally provide some sample data, expected outcome and the outcome you get for now.
FreelanceReinh
Jade | Level 19

Hello @Ody,

 

You have a series of IF-THEN statements without ELSE statements. So, if one "phone number" satisfies more than one IF condition, it will receive the flag of the last condition met. (The flag is overwritten each time a match is detected.) Hence, every number satisfying the 'S' condition (the last in the list) will get the 'S' flag, regardless of other flags it may have temporarily received before, when the other conditions were checked.

 

Your concrete examples, '000000000' or '9999999998' apparently must have satisfied the 'S' condition. In fact, they do not meet the 'R' condition (but they would receive the 'S' flag nevertheless even if they did): '000000000' has only 9 digits, but 10 are necessary to get the 'R' flag. '9999999998' does not consist of 10 repeated digits, it has nine 9s and one 8. Even if it had ten 9s and one 8 or eleven 9s (and no 😎 it would not meet the 'R' criterion, because the regular expression used there requires exactly 10 repeated digits in a row, so it is fairly restrictive.

 

It is fine that your phone numbers are stored as text. Otherwise, the application of the character functions SUBSTR, STRIP, etc. to them would enforce automatic numeric-to-character conversions.

Ody
Quartz | Level 8 Ody
Quartz | Level 8
@FreelanceReinhard

Makes sense. I thought that might be the case so I commented out the last condition and that defaulted the flags on all the mostly sequential numbers to "A" which matches that condition.
PGStats
Opal | Level 21

'000000000' doesn't match '/^\(?[2-9][0-8][0-9]\)? ?[2-9][0-9]{2}-?[0-9]{4}\b/' and that test is performed last, so it sets the flag to 'S'.

If you want the flag from the first positive test, you should use a if - else if - else if - else construct.

PG
Ody
Quartz | Level 8 Ody
Quartz | Level 8
@PGStats

Your suggestion also makes sense. I think I'll add in the 'if/else if" qualifiers and play with it until I get the results I'm looking for.

The longer I stare at the PRXMatch syntax the more it starts making sense.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1284 views
  • 1 like
  • 4 in conversation