Want to know if there is a better way to approach this problem.
I have a dataset that links a PH number to an account ID. See below:
data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654
I would like to create a dataset that counts the amount of common account IDs between two PH numbers (I would normally create an adjacency list in NetworkX/Python or iGraph/R to do this, but I cannot use either these technologies for this exercise).
An example of my desired output below:
Phone1,Phone2,Common
123456789,543219876,0
123456789,987654321,3
543219876,987654321,1
The code I have created (see below) kind of does this. However it obviously duplicates the rows and creates following output:
Phone1,Phone2,Common
Thus wondering if there is a more efficient way to do this, perhaps using a loop in a data step?
proc sql;
create table BlahBlah
as select distinct a.Phone as A, b.Phone as B,
sum(case when a.Account = b.Account then 1 else 0 end) as Common
from have a
inner join have b on a.Phone <> b.Phone
group by 1,2
; quit;
The dataset your code created can be modified further (didn't run this but should work):
data want;
length ph $21;
set want;
if input(strip(phone1), best.) > input(strip(phone2), best.) then ph=phone1||" "||phone2;
else ph=phone2||" "||phone1;
run;
proc sort;
by ph;
run;
data want;
set want;
by ph;
if first.ph;
run;
Hi @wylie_ma Keeping it simple albeit I acknowledge is boring-
data have;
infile cards dsd ;
input Phone :$10. Account $ ;
cards;
123456789, 12345
123456789, 54321
123456789, 67890
123456789, 09876
987654321, 12345
987654321, 54321
987654321, 45879
987654321, 94532
987654321, 67890
543219876, 45879
543219876, 54873
543219876, 56454
543219876, 87654
;
/*All pairs*/
data temp/view=temp;
merge have(rename=(phone=phone1 account=account1)) have(rename=(phone=phone2 account=account2));
run;
proc freq data=temp noprint;
tables phone1*phone2/ sparse out=temp2(where=(phone1 ne phone2) keep=phone:);
run;
/* Compute common*/
proc sql;
create table common as
select a.*,ifn(common=.,0,common) as common
from
(select * from temp2) a
left join
(select a.phone as phone1,b.phone as phone2,sum(a.account= b.account) as common
from have a, have b
where a.account=b.account and a.phone ne b.phone
group by phone1,phone2) b
on a.phone1=b.phone1 and a.phone2=b.phone2;
quit;
/*get the Phone pairs order to remove dups in the next step*/
data _common;
set common;
call sortc(phone1,phone2);
run;
/*Final step*/
proc sort data=_common out=want nodupkey;
by phone1 phone2;
run;
proc print noobs;run;
phone1 | phone2 | common |
---|---|---|
123456789 | 543219876 | 0 |
123456789 | 987654321 | 3 |
543219876 | 987654321 | 1 |
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.