Hello
I have a data set with list of customers.
For each customer there are 3 columns:
Customer_ID
entity_ID
Person_IDs (It is concatenation of people ID's who are owner )
As you can see specific entity can have multiple Customer_IDs
My target-
For each row (Customer_ID) check if there is any other Customer_ID with similar Person_IDs(but not same!!) that belong to different entity_ID . (It should be binary Var get value 1/0).
When I say similar Person_IDs I mean that at least one ID is same but not all of them.
I also want to add another new field that for each customer_ID concatenate the customer_IDs that were found as similar (similar personIDs and different entity)
For example:
For Customer_ID=1(Id's 999,888) we can see that there is another customer_ID(6) that have ownership with Id's 888 and can see that Person_IDs are not equal exactly but at least one of them is equal . so here new var will get value 1
For Customer_ID=2 same like for For Customer_ID=1 (new var will get value 1)
For Customer_ID=3 new var will get value 1 because another customer (7) have similar Person_IDs and different entity
For customer_ID=8 new var will get value 0
For customer_ID=9 new var will get value 0
Please note that in real world my data base contain 2 million rows.
I thought to do cartersain merge but i think it is not good because too many rows!!
Data have;
Input Customer_ID entity_ID Person_IDs $;
cards;
1 1 999,888
2 1 999,888
3 2 777
4 3 666
5 4 222,111
6 5 888
7 7 777,333
8 0 444,555
9 0 444,555
;
Run;
As a first step, expand those combined person_ids to multiple observations with a single person_id. From there, it is easy to formulate the join condition.
proc sql;
create table mult as
select distinct customer_id
from have a, have b
where
a.customer_id = b.customer_id and
a.person_id = b.person_id and
a.entity_id ne b.entity_id
;
quit;
data want;
merge
have (in=h)
mult (in=m)
;
by customer_id;
value = m;
if last.customer_id;
run;
Data have;
Input Customer_ID entity_ID Person_IDs $;
cards;
1 1 999,888
2 1 999,888
3 2 777
4 3 666
5 4 222,111
6 5 888
7 7 777,333
8 0 444,555
9 0 444,555
;
Run;
data temp;
set have;
do i=1 to countw(Person_IDs,',');
k=scan(Person_IDs,i,',');output;
end;
keep k entity_ID;
run;
proc sql;
create table k as
select distinct k,entity_ID as _entity_ID from temp;
quit;
data want;
if _n_=1 then do;
if 0 then set k;
declare hash h(dataset:'k',multidata:'y',hashexp:20);
h.definekey('k');
h.definedata('_entity_ID');
h.definedone();
end;
set have;
want=0;
do i=1 to countw(Person_IDs,',');
k=scan(Person_IDs,i,',');
rc=h.find();
do while(rc=0);
if entity_ID ne _entity_ID then do;want=1;leave;end;
rc=h.find_next();
end;
if want then leave;
end;
drop i k rc _entity_ID;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.