Hello
I have a dataset with customers and their address, and another one with customers and any customer they are related to (i.e. married, children of, grandparents ect.)
I need to identify how many relations each customer has with anyone else at the address.
Below are dummy tables I've created: so for example, John has 3 relations with people at the same address (with Mary, Louise and Paul at address 100), while Mary only 1 and so on. Tommy and Angela are at the same address but have no relation, therefore they will be excluded from the final output.
The below code works fine and gives me the output I need (I also need to ensure only people at the same address are considered, but I can do with some joins at the end I guess).
My issue is that this code would take hours to run against the real data with hundreds of thousand records.
is there any way to improve this? (I was looking at using arrays as an alternative, but could no make it work)
/*dummy table with address Id and customer*/
data ADDR_CUST;
input addr_id cust_name $;
datalines;
100 JOHN
100 MARY
100 LOUISE
100 PAUL
101 LINDA
101 JACK
101 ADAM
102 TOMMY
102 ANGELA
;
proc print data=ADDR_CUST;
run;
/*dummy table with customer relationship*/
title " ";
data CUST_REL;
input cust_name $ cust_rel_name $;
datalines;
JOHN MARY
JOHN LOUISE
JOHN PAUL
MARY JOHN
MARY LOUISE
MARY PAUL
LOUISE JOHN
LOUISE PAUL
PAUL JOHN
PAUL LOUISE
LINDA JACK
LINDA ADAM
JACK LINDA
JACK ADAM
ADAM LINDA
ADAM JACK
TOMMY .
ANGELA .
;
proc print data=CUST_REL;
run;
title " ";
/*create mASter table where the macro will add records*/
PROC SQL;
CREATE TABLE WORK.CUS_REL_MASTER
(
addr_id numeric (3),
cust_name char(10),
cust_name2 char(10),
cust_rel_name char(10)
);
QUIT;
/*run macro*/
%macro makereport (addr_id=, cust_name=);
PROC SQL;
CREATE TABLE WORK.CUS_REL_MASTER_AUX AS
select
t1.addr_id,
t2.cust_name,
t2.cust_rel_name
FROM WORK.ADDR_CUST t1
inner join WORK.CUST_REL t2
on t1.cust_name = t2.cust_rel_name
where t1.cust_name = "&cust_name"
and t1.addr_id = &addr_id
;
QUIT;
proc append base=WORK.CUS_REL_MASTER data=WORK.CUS_REL_MASTER_AUX force;
run;
%mend makereport;
/*create the variables for the macro*/
data ;
set WORK.ADDR_CUST;
str = catt('%makereport(cust_name=', cust_name,',addr_id=',addr_id,');');
call execute(str);
run;
/*final output showing the number of relations for each customer*/
title1 "Final output - Number of relations for each customer";
PROC SQL;
SELECT t1.addr_id,
t1.cust_rel_name,
count(DISTINCT(t1.cust_name)) AS Number_of_Relations
FROM WORK.CUS_REL_MASTER t1
WHERE t1.cust_rel_name IS NOT missing
GROUP BY t1.addr_id,
t1.cust_rel_name;
QUIT;
thanks
... View more