Hello
I have a data set with list of customers.
For each customer there are 3 columns:
Customer_ID
entity_ID
Person_IDs (It is concatenation of people ID's who are owner )
As you can see specific entity can have multiple Customer_IDs
My target-
For each row (Customer_ID) check if there is any other Customer_ID with similar Person_IDs(but not same!!) that belong to different entity_ID . (It should be binary Var get value 1/0).
When I say similar Person_IDs I mean that at least one ID is same but not all of them.
I also want to add another new field that for each customer_ID concatenate the customer_IDs that were found as similar (similar personIDs and different entity)
For example:
For Customer_ID=1(Id's 999,888) we can see that there is another customer_ID(6) that have ownership with Id's 888 and can see that Person_IDs are not equal exactly but at least one of them is equal . so here new var will get value 1
For Customer_ID=2 same like for For Customer_ID=1 (new var will get value 1)
For Customer_ID=3 new var will get value 1 because another customer (7) have similar Person_IDs and different entity
For customer_ID=8 new var will get value 0
For customer_ID=9 new var will get value 0
Please note that in real world my data base contain 2 million rows.
I thought to do cartersain merge but i think it is not good because too many rows!!
Data have;
Input Customer_ID entity_ID Person_IDs $;
cards;
1 1 999,888
2 1 999,888
3 2 777
4 3 666
5 4 222,111
6 5 888
7 7 777,333
8 0 444,555
9 0 444,555
;
Run;
As a first step, expand those combined person_ids to multiple observations with a single person_id. From there, it is easy to formulate the join condition.
proc sql;
create table mult as
select distinct customer_id
from have a, have b
where
a.customer_id = b.customer_id and
a.person_id = b.person_id and
a.entity_id ne b.entity_id
;
quit;
data want;
merge
have (in=h)
mult (in=m)
;
by customer_id;
value = m;
if last.customer_id;
run;
Data have;
Input Customer_ID entity_ID Person_IDs $;
cards;
1 1 999,888
2 1 999,888
3 2 777
4 3 666
5 4 222,111
6 5 888
7 7 777,333
8 0 444,555
9 0 444,555
;
Run;
data temp;
set have;
do i=1 to countw(Person_IDs,',');
k=scan(Person_IDs,i,',');output;
end;
keep k entity_ID;
run;
proc sql;
create table k as
select distinct k,entity_ID as _entity_ID from temp;
quit;
data want;
if _n_=1 then do;
if 0 then set k;
declare hash h(dataset:'k',multidata:'y',hashexp:20);
h.definekey('k');
h.definedata('_entity_ID');
h.definedone();
end;
set have;
want=0;
do i=1 to countw(Person_IDs,',');
k=scan(Person_IDs,i,',');
rc=h.find();
do while(rc=0);
if entity_ID ne _entity_ID then do;want=1;leave;end;
rc=h.find_next();
end;
if want then leave;
end;
drop i k rc _entity_ID;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.