DATA Step, Macro, Functions and more

Cartesian Product within Groups without Repetition

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Cartesian Product within Groups without Repetition

[ Edited ]

I have a dataset with more than one record per customer. I want to get output in which for every customer, there is every possible combinations between its records, however without repetition, i.e. once combining record 1 and 2 for customer 111 the combination 2 and 1 should not be added as it is considered repetition. Similarly records 1 and 1 should not be combined as well.

 

With this example data, the output should result in 6 rows. However, I've only got it to the point in which repetitions are included and the output is 8 rows.

data test;
input Cust_ID Record_ID ;
datalines;
111 1 
111 2 
222 3 
222 4 
222 5 
;

proc sql;                  
  create table test2 as    
    select a.*, b.Record_ID as ID2                           
    from test a 
	left join test b
	on a.Cust_ID = b.Cust_ID and a.record_id ^= b.record_id ;              
quit;  

Accepted Solutions
Solution
‎01-25-2018 08:00 AM
Trusted Advisor
Posts: 1,822

Re: Cartesian Product within Groups without Repetition

Posted in reply to KonstantinVasil

I'm not sure but I have the feeling that you need replace line:

on a.Cust_ID = b.Cust_ID and a.record_id ^= b.record_id ;

with

on a.Cust_ID = b.Cust_ID and a.record_id < b.record_id ;

View solution in original post


All Replies
Solution
‎01-25-2018 08:00 AM
Trusted Advisor
Posts: 1,822

Re: Cartesian Product within Groups without Repetition

Posted in reply to KonstantinVasil

I'm not sure but I have the feeling that you need replace line:

on a.Cust_ID = b.Cust_ID and a.record_id ^= b.record_id ;

with

on a.Cust_ID = b.Cust_ID and a.record_id < b.record_id ;
Trusted Advisor
Posts: 1,822

Re: Cartesian Product within Groups without Repetition

[ Edited ]

Running your code as is result into 6 records (not 8), 

but in two of them ID2 is missing.

Is it acceptable ?

 

Otherwise change the sql to - 

proc sql;                  
  create table test2(where=(ID2 is not missing)) as    
    select a.*, b.Record_ID as ID2                           
    from test a 
	left join test b
	on a.Cust_ID = b.Cust_ID and a.record_id < b.record_id ;              
quit;  

this will reult into 4 records:

   

111 1 2
222 3 4
222 3 5
222 4 5

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 98 views
  • 1 like
  • 2 in conversation