data phone_number;
input Phone_Number $ 15.;
standard_phone = catx('-', SUBSTR(PUT(phone_number, Z10.), 1, 3), SUBSTR(PUT(phone_number, Z10.), 4, 2), SUBSTR(PUT(phone_number, Z10.), 6, 4));
datalines;
123-545-5421
1236439775
7066950392
123-543-2345
8766783469
304-762-2467
8766783469
Na
123-545-5421
7066950392
123-543-2345
8766783469
304-762-2467
123-545-5421
1236439775
7066950392
Na
8766783469
8766783469
;
run;
Here i want to clean the data and standard format for phone numbers in the above data
i want miniplates all phone numbers below format
123-543-2345
and also count of each format like below
123-543-2345 totalcount=7
8766783469 totalcount=11
Na total count=2
data phone_number;
input Phone_Number $ 15.;
datalines;
123-545-5421
1236439775
7066950392
123-543-2345
8766783469
304-762-2467
8766783469
Na
123-545-5421
7066950392
123-543-2345
8766783469
304-762-2467
123-545-5421
1236439775
7066950392
Na
8766783469
8766783469
;
data want;
/* Create variable named TYPE for counting purposes */
length type $ 8;
set phone_number;
is_dash=find(phone_number,'-');
if is_dash=0 and upcase(phone_number)^='NA' and not missing(phone_number) then do;
phone_number=cats(substr(phone_number,1,3),'-',substr(phone_number,4,3),'-',substr(phone_number,7));
type='No Dash';
end;
else if upcase(phone_number)='NA' then type='NA';
else if not missing(phone_number) then type='Dash';
run;
proc freq data=want;
tables type;
run;
data phone_number;
input Phone_Number $ 15.;
datalines;
123-545-5421
1236439775
7066950392
123-543-2345
8766783469
304-762-2467
8766783469
Na
123-545-5421
7066950392
123-543-2345
8766783469
304-762-2467
123-545-5421
1236439775
7066950392
Na
8766783469
8766783469
;
data want;
/* Create variable named TYPE for counting purposes */
length type $ 8;
set phone_number;
is_dash=find(phone_number,'-');
if is_dash=0 and upcase(phone_number)^='NA' and not missing(phone_number) then do;
phone_number=cats(substr(phone_number,1,3),'-',substr(phone_number,4,3),'-',substr(phone_number,7));
type='No Dash';
end;
else if upcase(phone_number)='NA' then type='NA';
else if not missing(phone_number) then type='Dash';
run;
proc freq data=want;
tables type;
run;
hi Miller
Thank You for solution
why you used upcase function for phone_number ?
@BrahmanandaRao wrote:
hi Miller
Thank You for solution
why you used upcase function for phone_number ?
perhaps someone types it as NA one time and Na another time and na another time
Brilliant 🤗
@BrahmanandaRao wrote:
hi Miller
Thank You for solution
why you used upcase function for phone_number ?
BR549
Maybe not as common as it used to be but there are letters associated with certain digits of phone numbers. B is 2 (along with A and C) , R is 7 (along with P Q and S). Some organizations pay big money to have a number that "spells" something useful to them.
So you may want to consider that as part of whatever purpose this has.
Remember, if you aren't doing arithmetic with it then it is not a number just a collection of digits.
Nobody knows Junior Samples' phone number.
data phone_number;
input Phone_Number $ 15.;
datalines;
123-545-5421
1236439775
7066950392
123-543-2345
8766783469
304-762-2467
8766783469
Na
123-545-5421
7066950392
123-543-2345
8766783469
304-762-2467
123-545-5421
1236439775
7066950392
Na
8766783469
8766783469
;
proc format;
picture fmt
low-high='000-000-0000'
.=' ';
run;
data want;
set phone_number;
new_phone_number=coalescec(put(input(compress(phone_number,,'kd'),best32.),fmt.),Phone_Number);
run;
proc freq data=want ;
tables new_phone_number/missing;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.