Want to know if there is a better way to approach this problem.
I have a dataset that links a PH number to an account ID. See below:
data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654
I would like to create a dataset that counts the amount of common account IDs between two PH numbers (I would normally create an adjacency list in NetworkX/Python or iGraph/R to do this, but I cannot use either these technologies for this exercise).
An example of my desired output below:
Phone1,Phone2,Common
123456789,543219876,0
123456789,987654321,3
543219876,987654321,1
The code I have created (see below) kind of does this. However it obviously duplicates the rows and creates following output:
Phone1,Phone2,Common
Thus wondering if there is a more efficient way to do this, perhaps using a loop in a data step?
proc sql;
create table BlahBlah
as select distinct a.Phone as A, b.Phone as B,
sum(case when a.Account = b.Account then 1 else 0 end) as Common
from have a
inner join have b on a.Phone <> b.Phone
group by 1,2
; quit;
The dataset your code created can be modified further (didn't run this but should work):
data want;
length ph $21;
set want;
if input(strip(phone1), best.) > input(strip(phone2), best.) then ph=phone1||" "||phone2;
else ph=phone2||" "||phone1;
run;
proc sort;
by ph;
run;
data want;
set want;
by ph;
if first.ph;
run;
Hi @wylie_ma Keeping it simple albeit I acknowledge is boring-
data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654
;
/*All pairs*/
data temp/view=temp;
merge have(rename=(phone=phone1 account=account1)) have(rename=(phone=phone2 account=account2));
run;
proc freq data=temp noprint;
tables phone1*phone2/ sparse out=temp2(where=(phone1 ne phone2) keep=phone:);
run;
/* Compute common*/
proc sql;
create table common as
select a.*,ifn(common=.,0,common) as common
from
(select * from temp2) a
left join
(select a.phone as phone1,b.phone as phone2,sum(a.account= b.account) as common
from have a, have b
where a.account=b.account and a.phone ne b.phone
group by phone1,phone2) b
on a.phone1=b.phone1 and a.phone2=b.phone2;
quit;
/*get the Phone pairs order to remove dups in the next step*/
data _common;
set common;
call sortc(phone1,phone2);
run;
/*Final step*/
proc sort data=_common out=want nodupkey;
by phone1 phone2;
run;
proc print noobs;run;
phone1 | phone2 | common |
---|---|---|
123456789 | 543219876 | 0 |
123456789 | 987654321 | 3 |
543219876 | 987654321 | 1 |
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.