topic Re: Find IDs common to two distinct values in SAS Programming

Find IDs common to two distinct values

wylie_ma — Sat, 17 Oct 2020 02:11:34 GMT

Want to know if there is a better way to approach this problem.

I have a dataset that links a PH number to an account ID. See below:

data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654

I would like to create a dataset that counts the amount of common account IDs between two PH numbers (I would normally create an adjacency list in NetworkX/Python or iGraph/R to do this, but I cannot use either these technologies for this exercise).

An example of my desired output below:

Phone1,Phone2,Common

123456789,543219876,0

123456789,987654321,3

543219876,987654321,1

The code I have created (see below) kind of does this. However it obviously duplicates the rows and creates following output:

Phone1,Phone2,Common

123456789,543219876,0
123456789,987654321,3
543219876,123456789,0
543219876,987654321,1
987654321,123456789,3
987654321,543219876,1

Thus wondering if there is a more efficient way to do this, perhaps using a loop in a data step?

proc sql;
create table BlahBlah
as select distinct a.Phone as A, b.Phone as B,
sum(case when a.Account = b.Account then 1 else 0 end) as Common
from have a
inner join have b on a.Phone <> b.Phone
group by 1,2
; quit;

Re: Find IDs common to two distinct values

vellad — Sat, 17 Oct 2020 02:43:09 GMT

The dataset your code created can be modified further (didn't run this but should work):

data want;
  length ph $21;
  set want;
  if input(strip(phone1), best.) > input(strip(phone2), best.) then  ph=phone1||" "||phone2;
  else ph=phone2||" "||phone1;
run;

proc sort;
  by ph;
run;

data want;
  set want;
  by ph;
  if first.ph;
run;

Re: Find IDs common to two distinct values

novinosrin — Sat, 17 Oct 2020 20:21:30 GMT

Hi @wylie_ma Keeping it simple albeit I acknowledge is boring-

data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654
;
/*All pairs*/
data temp/view=temp;
 merge have(rename=(phone=phone1 account=account1)) have(rename=(phone=phone2 account=account2));
run;
proc freq data=temp noprint;
 tables phone1*phone2/ sparse out=temp2(where=(phone1 ne phone2) keep=phone:);
run;
/* Compute common*/
proc sql;
 create table common as
 select a.*,ifn(common=.,0,common) as common
 from
 (select * from temp2) a
 left join
 (select a.phone as phone1,b.phone as phone2,sum(a.account= b.account) as common
 from have a, have b
 where a.account=b.account and a.phone ne b.phone
 group by phone1,phone2) b
 on a.phone1=b.phone1 and a.phone2=b.phone2;
quit;
/*get the Phone pairs order to remove dups in the next step*/
data _common;
 set common;
 call sortc(phone1,phone2);
run;
/*Final step*/
proc sort data=_common out=want nodupkey;
 by phone1 phone2;
run;

proc print noobs;run;

phone1	phone2	common
123456789	543219876	0
123456789	987654321	3
543219876	987654321	1