BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

I have a data set with list of customers.

For each customer there are 3 columns:

Customer_ID

entity_ID  

Person_IDs  (It is concatenation of people ID's who are owner )

 

As you can see  specific entity can have multiple Customer_IDs

My target-

For each row (Customer_ID) check if there is any other Customer_ID with similar Person_IDs(but not same!!) that belong to different entity_ID . (It should be binary Var get value 1/0).

When I say similar Person_IDs I mean that at least one ID is same but not all of them.

 

I also want to add another new field that  for each customer_ID  concatenate the customer_IDs that were found as similar (similar personIDs and different entity)

For example:

For Customer_ID=1(Id's 999,888) we can see that there is another customer_ID(6) that have ownership with Id's 888 and can see that Person_IDs are not equal exactly but at least one of them is equal . so here new var will get value 1

 

For Customer_ID=2  same like for For Customer_ID=1  (new var will get value 1)

 

For Customer_ID=3 new var will get value 1 because another customer (7) have similar Person_IDs and different entity

 

For customer_ID=8 new var will get value 0

 

For customer_ID=9 new var will get value 0

 

Please note that in real world my data base contain 2 million rows.

I thought to do cartersain  merge but i think it is not good because too many rows!!

Data have;
Input Customer_ID  entity_ID Person_IDs $;
cards;
1 1 999,888
2 1 999,888
3 2 777
4 3 666
5 4 222,111
6 5 888
7 7 777,333
8 0 444,555
9 0 444,555
;
Run;

 

 

 

 

 

 

2 REPLIES 2
Kurt_Bremser
Super User

As a first step, expand those combined person_ids to multiple observations with a single person_id. From there, it is easy to formulate the join condition.

proc sql;
create table mult as
  select distinct customer_id
  from have a, have b
  where
    a.customer_id = b.customer_id and
    a.person_id = b.person_id and
    a.entity_id ne b.entity_id
;
quit;

data want;
merge
  have (in=h)
  mult (in=m)
;
by customer_id;
value = m;
if last.customer_id;
run;
Ksharp
Super User
Data have;
Input Customer_ID  entity_ID Person_IDs $;
cards;
1 1 999,888
2 1 999,888
3 2 777
4 3 666
5 4 222,111
6 5 888
7 7 777,333
8 0 444,555
9 0 444,555
;
Run;
data temp;
set have;
do i=1 to countw(Person_IDs,',');
 k=scan(Person_IDs,i,',');output;
end;
keep k entity_ID;
run;
proc sql;
create table k as
select distinct k,entity_ID as _entity_ID from temp;
quit;
data want;
 if _n_=1 then do;
  if 0 then set k;
  declare hash h(dataset:'k',multidata:'y',hashexp:20);
  h.definekey('k');
  h.definedata('_entity_ID');
  h.definedone();
 end;
set have;
want=0;
do i=1 to countw(Person_IDs,',');
 k=scan(Person_IDs,i,',');
 rc=h.find();
 do while(rc=0);
   if entity_ID ne _entity_ID then do;want=1;leave;end;
   rc=h.find_next();
 end;
 if want then leave;
end;
drop i k rc _entity_ID;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 472 views
  • 1 like
  • 3 in conversation