Hello,
I recently posted a question about phone number validation, etc, and was referred to this article: https://heuristically.wordpress.com/2012/10/30/pho
It's a great article and the program works fine, however, when i look through my results I'm not getting the flags I would expect based on how that portion of the program is written.
/* A=area code not in service */ if substr(&sm_phone_number, 1, 3) not in (&mv_npa) then &sm_exception = 'A'; /* R=repeating number like 5555555555 is a probable fake */ if prxmatch('/^([0-9])(\1{9})$/', strip(&sm_phone_number)) eq 1 then &sm_exception = 'R'; /* I=Skype and Google GMail phone numbers do not allow inbound calls */ if &sm_phone_number in ('2025808200', '7607058888') then &sm_exception = 'I'; /* D=directory assistance <https://en.wikipedia.org/wiki/555-1212> */ if prxmatch('/^[0-9]{3}5551212$/', trim(&sm_phone_number)) eq 1 then &sm_exception = 'D'; /* F=numbers specifically reserved for fictional use are "555-0100" through "555-0199" */ if prxmatch('/^[2-9][0-8][0-9]55501[0-9]{2}$/', strip(&sm_phone_number)) eq 1 then &sm_exception = 'F'; /* fake */ /* 1=the last two digits of NXX cannot both be 1, to avoid confusion with the N11 * codes (http://en.wikipedia.org/wiki/North_American_Numbering_Plan) * Only non-geographic area codes, such as toll-free 800/888/877/866/855 numbers * and 900 numbers may use N11 as the telephone exchange prefix, since * the area code must always be dialed for these numbers. * <https://en.wikipedia.org/wiki/N11_code> */ if (prxmatch('/^(800|888|877|866|855|900)/', strip(&sm_phone_number)) ne 1) and (prxmatch('/^[2-9][0-8][0-9][2-9]11[0-9]{4}\b/', strip(&sm_phone_number)) eq 1) then &sm_exception = '1'; /* S=basic NANP syntax */ if prxmatch('/^\(?[2-9][0-8][0-9]\)? ?[2-9][0-9]{2}-?[0-9]{4}\b/', strip(&sm_phone_number)) ne 1 then &sm_exception = 'S';
For example, all phone numbers in my dataset with sequential numbers like '000000000' or '9999999998', etc, are being flagged as "S" instead of "R" and I can't figure out why. I've been reading all the literature online regarding PRXMatch but it's not clicking for me.
A thought just occured to me, could this be because my phone number fields are all text?
I appreciate any insight into this. Thanks!
Hello @Ody,
You have a series of IF-THEN statements without ELSE statements. So, if one "phone number" satisfies more than one IF condition, it will receive the flag of the last condition met. (The flag is overwritten each time a match is detected.) Hence, every number satisfying the 'S' condition (the last in the list) will get the 'S' flag, regardless of other flags it may have temporarily received before, when the other conditions were checked.
Your concrete examples, '000000000' or '9999999998' apparently must have satisfied the 'S' condition. In fact, they do not meet the 'R' condition (but they would receive the 'S' flag nevertheless even if they did): '000000000' has only 9 digits, but 10 are necessary to get the 'R' flag. '9999999998' does not consist of 10 repeated digits, it has nine 9s and one 8. Even if it had ten 9s and one 8 or eleven 9s (and no 😎 it would not meet the 'R' criterion, because the regular expression used there requires exactly 10 repeated digits in a row, so it is fairly restrictive.
It is fine that your phone numbers are stored as text. Otherwise, the application of the character functions SUBSTR, STRIP, etc. to them would enforce automatic numeric-to-character conversions.
'000000000' doesn't match '/^\(?[2-9][0-8][0-9]\)? ?[2-9][0-9]{2}-?[0-9]{4}\b/' and that test is performed last, so it sets the flag to 'S'.
If you want the flag from the first positive test, you should use a if - else if - else if - else construct.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.