BookmarkSubscribeRSS Feed
MART1
Quartz | Level 8

Hello

 

I have a dataset with customers and their address, and another one with customers and any customer they are related to (i.e. married, children of, grandparents ect.)

 

I need to identify how many relations each customer has with anyone else at the address.

 

Below are dummy tables I've created: so for example, John has 3 relations with people at the same address (with Mary, Louise and Paul at address 100), while Mary only 1 and so on. Tommy and Angela are at the same address but have no relation, therefore they will be excluded from the final output.

38b8f3f6-a402-49e8-a202-157d530252f1.png

The below code works fine and gives me the output I need (I also need to ensure only people at the same address are considered, but I can do with some joins at the end I guess).

My issue is that this code would take hours to run against the real data with hundreds of thousand records.

 

 

is there any way to improve this? (I was looking at using arrays as an alternative, but could no make it work)

 

/*dummy table with address Id and customer*/
data ADDR_CUST;
	input addr_id cust_name $;
	datalines;
100 JOHN
100 MARY
100 LOUISE
100 PAUL
101 LINDA
101 JACK
101 ADAM
102 TOMMY
102 ANGELA
;

proc print data=ADDR_CUST;
run;

/*dummy table with customer relationship*/
title " ";
data CUST_REL;
	input cust_name $ cust_rel_name $;
	datalines;
JOHN MARY
JOHN LOUISE
JOHN PAUL
MARY JOHN
MARY LOUISE
MARY PAUL
LOUISE JOHN
LOUISE PAUL
PAUL JOHN
PAUL LOUISE
LINDA JACK
LINDA ADAM
JACK LINDA
JACK ADAM
ADAM LINDA
ADAM JACK
TOMMY .
ANGELA .
;

proc print data=CUST_REL;
run;
title " ";

/*create mASter table where the macro will add records*/
PROC SQL;
	CREATE TABLE WORK.CUS_REL_MASTER
		(
		addr_id numeric (3),
		cust_name char(10),
		cust_name2 char(10),
		cust_rel_name char(10)
		);
QUIT;

/*run macro*/
%macro makereport (addr_id=, cust_name=);
PROC SQL;
	CREATE TABLE WORK.CUS_REL_MASTER_AUX AS 
		select 
		    t1.addr_id,
			t2.cust_name,
            t2.cust_rel_name
		FROM WORK.ADDR_CUST t1
			inner join WORK.CUST_REL t2
				on t1.cust_name = t2.cust_rel_name
			where t1.cust_name = "&cust_name"
				and t1.addr_id = &addr_id
;
				QUIT;

proc append base=WORK.CUS_REL_MASTER data=WORK.CUS_REL_MASTER_AUX force;
run;

%mend makereport;

/*create the variables for the macro*/
data ;
	set WORK.ADDR_CUST;
	str = catt('%makereport(cust_name=', cust_name,',addr_id=',addr_id,');');
		call execute(str);
run;

/*final output showing the number of relations for each customer*/
title1 "Final output - Number of relations for each customer";
PROC SQL;
SELECT t1.addr_id, 
          t1.cust_rel_name, 
            count(DISTINCT(t1.cust_name)) AS Number_of_Relations
      FROM WORK.CUS_REL_MASTER t1
	  WHERE t1.cust_rel_name IS NOT missing
      GROUP BY t1.addr_id,
               t1.cust_rel_name;
QUIT;


thanks

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

2 questions.

 

  • What does your desired result look like? Please post that as well. Makes it much easier to provide a usable code answer. 
  • I think you would need an addr_id in the CUST_REL table. Otherwise, how would you know what address the JOHN-PAUL relation comes from if another address also has a John and a Paul?
MART1
Quartz | Level 8

Hi @PeterClemmensen 

 

thanks for your reply.

 

The output I'd need is the one created in the last output titled "Final output - Number of relations for each customer"

(I'll then have to ensure only customers at the same address are considered, but that should be doable with some further steps.)

 

Re adding the addr_id in the CUST_REL table; yes it can be done but possibly not needed. I'm using names to simplify the code to show; in reality I have customer Id which are unique (so there will only be one John and one Paul)

 

thanks for looking into this

 

PeterClemmensen
Tourmaline | Level 20

Ok. I still need to understand this. For example, why does Louise have 3 relations? I can only count 2?

MART1
Quartz | Level 8

ah, you are right - in the CUS_REL I missed the LOUISE MARY record - now added below

/*dummy table with customer relationship*/
data CUST_REL;
	input cust_name $ cust_rel_name $;
	datalines;
JOHN MARY
JOHN LOUISE
JOHN PAUL
MARY JOHN
MARY LOUISE
MARY PAUL
LOUISE JOHN
LOUISE MARY
LOUISE PAUL
PAUL JOHN
PAUL LOUISE
LINDA JACK
LINDA ADAM
JACK LINDA
ADAM LINDA
JACK ADAM
ADAM JACK
TOMMY .
ANGELA .
;

proc print data=CUST_REL;
run

 

thanks for spotting it, my mistake when creating the dummy table

 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 575 views
  • 0 likes
  • 2 in conversation