I have a dataset with more than one record per customer. I want to get output in which for every customer, there is every possible combinations between its records, however without repetition, i.e. once combining record 1 and 2 for customer 111 the combination 2 and 1 should not be added as it is considered repetition. Similarly records 1 and 1 should not be combined as well.
With this example data, the output should result in 6 rows. However, I've only got it to the point in which repetitions are included and the output is 8 rows.
data test; input Cust_ID Record_ID ; datalines; 111 1 111 2 222 3 222 4 222 5 ; proc sql; create table test2 as select a.*, b.Record_ID as ID2 from test a left join test b on a.Cust_ID = b.Cust_ID and a.record_id ^= b.record_id ; quit;
I'm not sure but I have the feeling that you need replace line:
on a.Cust_ID = b.Cust_ID and a.record_id ^= b.record_id ;
with
on a.Cust_ID = b.Cust_ID and a.record_id < b.record_id ;
I'm not sure but I have the feeling that you need replace line:
on a.Cust_ID = b.Cust_ID and a.record_id ^= b.record_id ;
with
on a.Cust_ID = b.Cust_ID and a.record_id < b.record_id ;
Running your code as is result into 6 records (not 8),
but in two of them ID2 is missing.
Is it acceptable ?
Otherwise change the sql to -
proc sql;
create table test2(where=(ID2 is not missing)) as
select a.*, b.Record_ID as ID2
from test a
left join test b
on a.Cust_ID = b.Cust_ID and a.record_id < b.record_id ;
quit;
this will reult into 4 records:
111 1 2 222 3 4 222 3 5 222 4 5
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.