I have a dataset with phone numbers. There are 10 phone number fields for each observation. phone1 to phone10. What I am trying to do is determine if the number in phone1 is unique versus the number in fields phone2 through phone10. Also, is phone2 unique against all the other phone fields, etc. I can do this with a bunch of if statements, but I was thinking there must be a macro way to do this that would be less prone to typing error.
This might be the most flexible approach, for future analysis purposes. It produces 10 new variables (count1-count10), containing a count of how many times the phone number appears within the set of 10.
data want;
set have;
array phone {10};
array count {10};
do i=1 to 10;
count{i}=0;
end;
do i=1 to 10;
do j=1 to 10;
if phone{i} = phone{j} then count{i} + 1;
end;
end;
run;
So uniqueness is identified as a COUNT variable being 1. But you can also look for low counts, not just unique vs. not unique.
With respect to a macro solution versus a data step solution, both would require do loops, and I don't see one as being easier or simpler or less coding than the other.
What is your purpose? What would be your final outcome? Are you de-dup the phone numbers or Are you only select those unique? The best way to explain is to offer a set of HAVE/WANT, along with your descriptions.
I want to get a a number of uniques for each phone variable category by a group of other variables. So breaking it out by day of the week, by agency, and other things. So how many uniques are there for each phone1, .... phone 10.
This will create pairs of values that are same
data test1;
set test;
length x $100;
x="";
array test{*} var1-var10;
do i=1 to dim(test)-1;
do j=1 to dim(test)-i;
if test{i}=test{i+j} then x=strip(x)||" "||strip(put(i,best.))||"-"||strip(put(i+j,best.));
end;
end;
drop i j;
run;
This might be the most flexible approach, for future analysis purposes. It produces 10 new variables (count1-count10), containing a count of how many times the phone number appears within the set of 10.
data want;
set have;
array phone {10};
array count {10};
do i=1 to 10;
count{i}=0;
end;
do i=1 to 10;
do j=1 to 10;
if phone{i} = phone{j} then count{i} + 1;
end;
end;
run;
So uniqueness is identified as a COUNT variable being 1. But you can also look for low counts, not just unique vs. not unique.
Hi. If all you want to do is identify the unique phone numbers in each observation, you could try this ...
proc format;
picture phone (default=14) low-high='999)-999-9999' (prefix='(');
run;
data phone;
input id p1-p10 @@;
datalines;
123
9999999999 1234567890 3234567890 4234567890
5234567890 6234567890 7234567890 9234567890
1234567890 9234567890
234
6234567890 1234567890 9234567890 8234567890
5234567890 6234567890 7234567890 1234567890
1234567890 6234567890
;
data u_phone (keep=id phone);
set phone;
array p(10);
all = catx('|',of p(*));
do i=1 to 10;
if lengthn(tranwrd(all,cat(p(i)),'*')) eq 100 then do; phone=p(i); output; end;
end;
format phone phone.;
run;
data set U_PHONE ...
id=123
Obs phone
1 (999)-999-9999
2 (323)-456-7890
3 (423)-456-7890
4 (523)-456-7890
5 (623)-456-7890
6 (723)-456-7890
id=234
Obs phone
7 (923)-456-7890
8 (823)-456-7890
9 (523)-456-7890
10 (723)-456-7890
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.