data have;
length borr_ph $10 co_borr_ph $10 alt_ph $10 ;
input borr_ph -- alt_ph;
datalines;
8042234314 0000000000 3334444444
0000000000 1111111111 9999999999
2132245566 1121111111 2321111111
1111114444 0000000000 1111111111
1112223333 8042342256 0000000000
;
run;
data want;
set have;
if substr(borr_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_borr_ph='Y';
if substr(co_borr_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_co_borr_ph='Y';
if substr(alt_ph,1,4) in('0000','1111','2222','3333','4444','9999') then no_alt_ph='Y';
if no_borr_ph ='Y' and no_co_borr_ph='Y' and no_alt_ph = 'Y' then No_Phone = 'Y';
run;
I used a substr analysis to try and identify phantom phone numbers (numbers that
obviously are bogus) I successfully flagged those with the phantom or non-phone number.
of array so I do not have to repeat the entire sequence of code as seen here.
Note the reason I am using a sequence of first four numbers is because it is unlikely
that a phone number begin with the same 4 numbers consecutive
Not sure exactly why you are testing 4 characters. If this is US and related phone numbers you have two section to check for 000- the first 3 are area code and next three are exchanges. I know from processing phone numbers I have seen invalid exchanges, like 000 when the area code was valid: 208-000-0000 for example.
Array processing would take two arrays such as:
data have;
length borr_ph $10 co_borr_ph $10 alt_ph $10 ;
input borr_ph -- alt_ph;
array ph (*) borr_ph -- alt_ph;
array np (3) no_borr_ph no_co_borr_ph no_alt_ph ;
do i= 1 to dim(ph);
np[i] = (substr(ph[i],1,4) in('0000','1111','2222','3333','4444','9999') );
end;
drop i;
datalines;
8042234314 0000000000 3334444444
0000000000 1111111111 9999999999
2132245566 1121111111 2321111111
1111114444 0000000000 1111111111
1112223333 8042342256 0000000000
;
I prefer numeric 1 (true)/0 (false) coded variables instead of 'Y' as there are more reporting options PLUS SAS will return a numeric 1/0 for comparisons. So cuts down on specific "Then var = 'Y';" type codes.
For instance, if you need to know if ALL of the values are "true" instead of 3 (or 20) if var='y' and var2='y' .. you can use:
No_phone = min(of np(*) );
Since the NP array elements will be 0 or 1 if at least one of the values has a valid phone (assuming those checks worked correctly) then the Min value for the array will be 0. If all are not valid then the min is 1 , or the "not a phone".
Notice that the " of arrayname(*) " syntax does not care at all about how many variables are in the array. So the length of the syntax doesn't change. Character variables take a bit more work, not much but some.
Would suggest looking a bit more into proper numbers. US for example US phones all use one single format: (NXX) NXX-XXXX, where N is 2-9 and X is 0-9. So area code cannot start with 0 or 1, neither can exchange (the second 3 characters).
If you really must see a Y/N then a custom format for 1/0 coded values can do that.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.