BookmarkSubscribeRSS Feed
Q1983
Lapis Lazuli | Level 10


data have;
length borr_ph $10 co_borr_ph $10 alt_ph $10 ;
input borr_ph -- alt_ph;
datalines;
8042234314 0000000000 3334444444
0000000000 1111111111 9999999999
2132245566 1121111111 2321111111
1111114444 0000000000 1111111111
1112223333 8042342256 0000000000
;
run;

data want;
set have;
if substr(borr_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_borr_ph='Y';
if substr(co_borr_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_co_borr_ph='Y';
if substr(alt_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_alt_ph='Y';
if no_borr_ph ='Y' and no_co_borr_ph='Y' and no_alt_ph = 'Y' then No_Phone = 'Y';
run;

 

I used a substr analysis to try and identify phantom phone numbers (numbers that

obviously are bogus)  I successfully flagged those with the phantom or non-phone number.

  1. My actual code has about 20 different phone numbers.  Is there a way to apply some type

of array so I do not have to repeat the entire sequence of code as seen here.

Note the reason I am using a sequence of first four numbers is because it is unlikely

that a phone number begin with the same 4 numbers consecutive

1 REPLY 1
ballardw
Super User

Not sure exactly why you are testing 4 characters. If this is US and related phone numbers you have two section to check for 000- the first 3 are area code and next three are exchanges. I know from processing phone numbers I have seen invalid exchanges, like 000 when the area code was valid: 208-000-0000 for example.

 

Array processing would take two arrays such as:

data have;
   length borr_ph $10 co_borr_ph $10 alt_ph $10 ;
   input borr_ph -- alt_ph;
   array ph (*) borr_ph -- alt_ph;
   array np (3) no_borr_ph no_co_borr_ph no_alt_ph  ; 
   do i= 1 to dim(ph);
      np[i] = (substr(ph[i],1,4) in('0000','1111','2222','3333','4444','9999')  );
   end;
   drop i;
datalines;
8042234314 0000000000 3334444444
0000000000 1111111111 9999999999
2132245566 1121111111 2321111111
1111114444 0000000000 1111111111
1112223333 8042342256 0000000000
;

I prefer numeric 1 (true)/0 (false) coded variables instead of 'Y' as there are more reporting options PLUS SAS will return a numeric 1/0 for comparisons. So cuts down on specific "Then var = 'Y';" type codes.

For instance, if you need to know if ALL of the values are "true" instead of 3 (or 20) if var='y' and var2='y' .. you can use:

 

No_phone = min(of np(*) );

Since the NP array elements will be 0 or 1 if at least one of the values has a valid phone (assuming those checks worked correctly) then the Min value for the array will be 0. If all are not valid then the min is 1 , or the "not a phone".

Notice that the " of arrayname(*) " syntax does not care at all about how many variables are in the array. So the length of the syntax doesn't change. Character variables take a bit more work, not much but some.

 

Would suggest looking a bit more into proper numbers. US for example US phones all use one single format: (NXX) NXX-XXXX, where N is 2-9 and X is 0-9. So area code cannot start with 0 or 1, neither can exchange (the second 3 characters).

 

If you really must see a Y/N then a custom format for 1/0 coded values can do that.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 385 views
  • 0 likes
  • 2 in conversation