BookmarkSubscribeRSS Feed
Q1983
Lapis Lazuli | Level 10


data have;
length borr_ph $10 co_borr_ph $10 alt_ph $10 ;
input borr_ph -- alt_ph;
datalines;
8042234314 0000000000 3334444444
0000000000 1111111111 9999999999
2132245566 1121111111 2321111111
1111114444 0000000000 1111111111
1112223333 8042342256 0000000000
;
run;

data want;
set have;
if substr(borr_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_borr_ph='Y';
if substr(co_borr_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_co_borr_ph='Y';
if substr(alt_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_alt_ph='Y';
if no_borr_ph ='Y' and no_co_borr_ph='Y' and no_alt_ph = 'Y' then No_Phone = 'Y';
run;

 

I used a substr analysis to try and identify phantom phone numbers (numbers that

obviously are bogus)  I successfully flagged those with the phantom or non-phone number.

  1. My actual code has about 20 different phone numbers.  Is there a way to apply some type

of array so I do not have to repeat the entire sequence of code as seen here.

Note the reason I am using a sequence of first four numbers is because it is unlikely

that a phone number begin with the same 4 numbers consecutive

1 REPLY 1
ballardw
Super User

Not sure exactly why you are testing 4 characters. If this is US and related phone numbers you have two section to check for 000- the first 3 are area code and next three are exchanges. I know from processing phone numbers I have seen invalid exchanges, like 000 when the area code was valid: 208-000-0000 for example.

 

Array processing would take two arrays such as:

data have;
   length borr_ph $10 co_borr_ph $10 alt_ph $10 ;
   input borr_ph -- alt_ph;
   array ph (*) borr_ph -- alt_ph;
   array np (3) no_borr_ph no_co_borr_ph no_alt_ph  ; 
   do i= 1 to dim(ph);
      np[i] = (substr(ph[i],1,4) in('0000','1111','2222','3333','4444','9999')  );
   end;
   drop i;
datalines;
8042234314 0000000000 3334444444
0000000000 1111111111 9999999999
2132245566 1121111111 2321111111
1111114444 0000000000 1111111111
1112223333 8042342256 0000000000
;

I prefer numeric 1 (true)/0 (false) coded variables instead of 'Y' as there are more reporting options PLUS SAS will return a numeric 1/0 for comparisons. So cuts down on specific "Then var = 'Y';" type codes.

For instance, if you need to know if ALL of the values are "true" instead of 3 (or 20) if var='y' and var2='y' .. you can use:

 

No_phone = min(of np(*) );

Since the NP array elements will be 0 or 1 if at least one of the values has a valid phone (assuming those checks worked correctly) then the Min value for the array will be 0. If all are not valid then the min is 1 , or the "not a phone".

Notice that the " of arrayname(*) " syntax does not care at all about how many variables are in the array. So the length of the syntax doesn't change. Character variables take a bit more work, not much but some.

 

Would suggest looking a bit more into proper numbers. US for example US phones all use one single format: (NXX) NXX-XXXX, where N is 2-9 and X is 0-9. So area code cannot start with 0 or 1, neither can exchange (the second 3 characters).

 

If you really must see a Y/N then a custom format for 1/0 coded values can do that.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 608 views
  • 0 likes
  • 2 in conversation