BookmarkSubscribeRSS Feed
wylie_ma
Fluorite | Level 6

Want to know if there is a better way to approach this problem.

 

I have a dataset that links a PH number to an account ID. See below:


data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654

 

I would like to create a dataset that counts the amount of common account IDs between two PH numbers (I would normally create an adjacency list in NetworkX/Python or iGraph/R to do this, but I cannot use either these technologies for this exercise).

 

An example of my desired output below:

Phone1,Phone2,Common

123456789,543219876,0

123456789,987654321,3

543219876,987654321,1

 

The code I have created (see below) kind of does this. However it obviously duplicates the rows and creates following output:

 

Phone1,Phone2,Common

123456789,543219876,0
123456789,987654321,3
543219876,123456789,0
543219876,987654321,1
987654321,123456789,3
987654321,543219876,1
 

Thus wondering if there is a more efficient way to do this, perhaps using a loop in a data step?

 

proc sql;
create table BlahBlah
as select distinct a.Phone as A, b.Phone as B,
sum(case when a.Account = b.Account then 1 else 0 end) as Common
from have a
inner join have b on a.Phone <> b.Phone
group by 1,2
; quit;

 

2 REPLIES 2
vellad
Obsidian | Level 7

The dataset your code created can be modified further (didn't run this but should work):

data want;
  length ph $21;
  set want;
  if input(strip(phone1), best.) > input(strip(phone2), best.) then  ph=phone1||" "||phone2;
  else ph=phone2||" "||phone1;
run;

proc sort;
  by ph;
run;

data want;
  set want;
  by ph;
  if first.ph;
run;
novinosrin
Tourmaline | Level 20

Hi @wylie_ma  Keeping it simple albeit I acknowledge is boring-

data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654
;
/*All pairs*/
data temp/view=temp;
 merge have(rename=(phone=phone1 account=account1)) have(rename=(phone=phone2 account=account2));
run;
proc freq data=temp noprint;
 tables phone1*phone2/ sparse out=temp2(where=(phone1 ne phone2) keep=phone:);
run;
/* Compute common*/
proc sql;
 create table common as
 select a.*,ifn(common=.,0,common) as common
 from
 (select * from temp2) a
 left join
 (select a.phone as phone1,b.phone as phone2,sum(a.account= b.account) as common
 from have a, have b
 where a.account=b.account and a.phone ne b.phone
 group by phone1,phone2) b
 on a.phone1=b.phone1 and a.phone2=b.phone2;
quit;
/*get the Phone pairs order to remove dups in the next step*/
data _common;
 set common;
 call sortc(phone1,phone2);
run;
/*Final step*/
proc sort data=_common out=want nodupkey;
 by phone1 phone2;
run;

proc print noobs;run;
phone1 phone2 common
123456789 543219876 0
123456789 987654321 3
543219876 987654321 1

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 475 views
  • 2 likes
  • 3 in conversation