Fluorite | Level 6

## Find IDs common to two distinct values

Want to know if there is a better way to approach this problem.

I have a dataset that links a PH number to an account ID. See below:

data have;
infile cards dsd ;
input Phone :\$10. Account \$ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654

I would like to create a dataset that counts the amount of common account IDs between two PH numbers (I would normally create an adjacency list in NetworkX/Python or iGraph/R to do this, but I cannot use either these technologies for this exercise).

An example of my desired output below:

Phone1,Phone2,Common

123456789,543219876,0

123456789,987654321,3

543219876,987654321,1

The code I have created (see below) kind of does this. However it obviously duplicates the rows and creates following output:

Phone1,Phone2,Common

123456789,543219876,0
123456789,987654321,3
543219876,123456789,0
543219876,987654321,1
987654321,123456789,3
987654321,543219876,1

Thus wondering if there is a more efficient way to do this, perhaps using a loop in a data step?

proc sql;
create table BlahBlah
as select distinct a.Phone as A, b.Phone as B,
sum(case when a.Account = b.Account then 1 else 0 end) as Common
from have a
inner join have b on a.Phone <> b.Phone
group by 1,2
; quit;

2 REPLIES 2
Obsidian | Level 7

## Re: Find IDs common to two distinct values

The dataset your code created can be modified further (didn't run this but should work):

``````data want;
length ph \$21;
set want;
if input(strip(phone1), best.) > input(strip(phone2), best.) then  ph=phone1||" "||phone2;
else ph=phone2||" "||phone1;
run;

proc sort;
by ph;
run;

data want;
set want;
by ph;
if first.ph;
run;``````
Tourmaline | Level 20

## Re: Find IDs common to two distinct values

Hi @wylie_ma  Keeping it simple albeit I acknowledge is boring-

``````data have;
infile cards dsd ;
input Phone :\$10. Account \$ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654
;
/*All pairs*/
data temp/view=temp;
merge have(rename=(phone=phone1 account=account1)) have(rename=(phone=phone2 account=account2));
run;
proc freq data=temp noprint;
tables phone1*phone2/ sparse out=temp2(where=(phone1 ne phone2) keep=phone:);
run;
/* Compute common*/
proc sql;
create table common as
select a.*,ifn(common=.,0,common) as common
from
(select * from temp2) a
left join
(select a.phone as phone1,b.phone as phone2,sum(a.account= b.account) as common
from have a, have b
where a.account=b.account and a.phone ne b.phone
group by phone1,phone2) b
on a.phone1=b.phone1 and a.phone2=b.phone2;
quit;
/*get the Phone pairs order to remove dups in the next step*/
data _common;
set common;
call sortc(phone1,phone2);
run;
/*Final step*/
proc sort data=_common out=want nodupkey;
by phone1 phone2;
run;

proc print noobs;run;``````
phone1 phone2 common
123456789 543219876 0
123456789 987654321 3
543219876 987654321 1
Discussion stats
• 2 replies
• 475 views
• 2 likes
• 3 in conversation