data have;
length borr_ph $10 co_borr_ph $10 alt_ph $10 ;
input borr_ph -- alt_ph;
datalines;
8042234314 0000000000 3334444444
0000000000 1111111111 9999999999
2132245566 1121111111 2321111111
1111114444 0000000000 1111111111
1112223333 8042342256 0000000000
;
run;
data want;
set have;
if substr(borr_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_borr_ph='Y';
if substr(co_borr_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_co_borr_ph='Y';
if substr(alt_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_alt_ph='Y';
if no_borr_ph ='Y' and no_co_borr_ph='Y' and no_alt_ph = 'Y' then No_Phone = 'Y';
run;
I used a substr analysis to try and identify phantom phone numbers (numbers that
obviously are bogus) I successfully flagged those with the phantom or non-phone number.
of array so I do not have to repeat the entire sequence of code as seen here.
Note the reason I am using a sequence of first four numbers is because it is unlikely
that a phone number begin with the same 4 numbers consecutive
Not sure exactly why you are testing 4 characters. If this is US and related phone numbers you have two section to check for 000- the first 3 are area code and next three are exchanges. I know from processing phone numbers I have seen invalid exchanges, like 000 when the area code was valid: 208-000-0000 for example.
Array processing would take two arrays such as:
data have; length borr_ph $10 co_borr_ph $10 alt_ph $10 ; input borr_ph -- alt_ph; array ph (*) borr_ph -- alt_ph; array np (3) no_borr_ph no_co_borr_ph no_alt_ph ; do i= 1 to dim(ph); np[i] = (substr(ph[i],1,4) in('0000','1111','2222','3333','4444','9999') ); end; drop i; datalines; 8042234314 0000000000 3334444444 0000000000 1111111111 9999999999 2132245566 1121111111 2321111111 1111114444 0000000000 1111111111 1112223333 8042342256 0000000000 ;
I prefer numeric 1 (true)/0 (false) coded variables instead of 'Y' as there are more reporting options PLUS SAS will return a numeric 1/0 for comparisons. So cuts down on specific "Then var = 'Y';" type codes.
For instance, if you need to know if ALL of the values are "true" instead of 3 (or 20) if var='y' and var2='y' .. you can use:
No_phone = min(of np(*) );
Since the NP array elements will be 0 or 1 if at least one of the values has a valid phone (assuming those checks worked correctly) then the Min value for the array will be 0. If all are not valid then the min is 1 , or the "not a phone".
Notice that the " of arrayname(*) " syntax does not care at all about how many variables are in the array. So the length of the syntax doesn't change. Character variables take a bit more work, not much but some.
Would suggest looking a bit more into proper numbers. US for example US phones all use one single format: (NXX) NXX-XXXX, where N is 2-9 and X is 0-9. So area code cannot start with 0 or 1, neither can exchange (the second 3 characters).
If you really must see a Y/N then a custom format for 1/0 coded values can do that.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.