BookmarkSubscribeRSS Feed
wylie_ma
Fluorite | Level 6

Want to know if there is a better way to approach this problem.

 

I have a dataset that links a PH number to an account ID. See below:


data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654

 

I would like to create a dataset that counts the amount of common account IDs between two PH numbers (I would normally create an adjacency list in NetworkX/Python or iGraph/R to do this, but I cannot use either these technologies for this exercise).

 

An example of my desired output below:

Phone1,Phone2,Common

123456789,543219876,0

123456789,987654321,3

543219876,987654321,1

 

The code I have created (see below) kind of does this. However it obviously duplicates the rows and creates following output:

 

Phone1,Phone2,Common

123456789,543219876,0
123456789,987654321,3
543219876,123456789,0
543219876,987654321,1
987654321,123456789,3
987654321,543219876,1
 

Thus wondering if there is a more efficient way to do this, perhaps using a loop in a data step?

 

proc sql;
create table BlahBlah
as select distinct a.Phone as A, b.Phone as B,
sum(case when a.Account = b.Account then 1 else 0 end) as Common
from have a
inner join have b on a.Phone <> b.Phone
group by 1,2
; quit;

 

2 REPLIES 2
vellad
Obsidian | Level 7

The dataset your code created can be modified further (didn't run this but should work):

data want;
  length ph $21;
  set want;
  if input(strip(phone1), best.) > input(strip(phone2), best.) then  ph=phone1||" "||phone2;
  else ph=phone2||" "||phone1;
run;

proc sort;
  by ph;
run;

data want;
  set want;
  by ph;
  if first.ph;
run;
novinosrin
Tourmaline | Level 20

Hi @wylie_ma  Keeping it simple albeit I acknowledge is boring-

data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654
;
/*All pairs*/
data temp/view=temp;
 merge have(rename=(phone=phone1 account=account1)) have(rename=(phone=phone2 account=account2));
run;
proc freq data=temp noprint;
 tables phone1*phone2/ sparse out=temp2(where=(phone1 ne phone2) keep=phone:);
run;
/* Compute common*/
proc sql;
 create table common as
 select a.*,ifn(common=.,0,common) as common
 from
 (select * from temp2) a
 left join
 (select a.phone as phone1,b.phone as phone2,sum(a.account= b.account) as common
 from have a, have b
 where a.account=b.account and a.phone ne b.phone
 group by phone1,phone2) b
 on a.phone1=b.phone1 and a.phone2=b.phone2;
quit;
/*get the Phone pairs order to remove dups in the next step*/
data _common;
 set common;
 call sortc(phone1,phone2);
run;
/*Final step*/
proc sort data=_common out=want nodupkey;
 by phone1 phone2;
run;

proc print noobs;run;
phone1 phone2 common
123456789 543219876 0
123456789 987654321 3
543219876 987654321 1

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 939 views
  • 2 likes
  • 3 in conversation