BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

I have a data set with list of customers.

For each customer there are 3 columns:

Customer_ID

entity_ID  

Person_IDs  (It is concatenation of people ID's who are owner )

 

As you can see  specific entity can have multiple Customer_IDs

My target-

For each row (Customer_ID) check if there is any other Customer_ID with similar Person_IDs(but not same!!) that belong to different entity_ID . (It should be binary Var get value 1/0).

When I say similar Person_IDs I mean that at least one ID is same but not all of them.

 

I also want to add another new field that  for each customer_ID  concatenate the customer_IDs that were found as similar (similar personIDs and different entity)

For example:

For Customer_ID=1(Id's 999,888) we can see that there is another customer_ID(6) that have ownership with Id's 888 and can see that Person_IDs are not equal exactly but at least one of them is equal . so here new var will get value 1

 

For Customer_ID=2  same like for For Customer_ID=1  (new var will get value 1)

 

For Customer_ID=3 new var will get value 1 because another customer (7) have similar Person_IDs and different entity

 

For customer_ID=8 new var will get value 0

 

For customer_ID=9 new var will get value 0

 

Please note that in real world my data base contain 2 million rows.

I thought to do cartersain  merge but i think it is not good because too many rows!!

Data have;
Input Customer_ID  entity_ID Person_IDs $;
cards;
1 1 999,888
2 1 999,888
3 2 777
4 3 666
5 4 222,111
6 5 888
7 7 777,333
8 0 444,555
9 0 444,555
;
Run;

 

 

 

 

 

 

2 REPLIES 2
Kurt_Bremser
Super User

As a first step, expand those combined person_ids to multiple observations with a single person_id. From there, it is easy to formulate the join condition.

proc sql;
create table mult as
  select distinct customer_id
  from have a, have b
  where
    a.customer_id = b.customer_id and
    a.person_id = b.person_id and
    a.entity_id ne b.entity_id
;
quit;

data want;
merge
  have (in=h)
  mult (in=m)
;
by customer_id;
value = m;
if last.customer_id;
run;
Ksharp
Super User
Data have;
Input Customer_ID  entity_ID Person_IDs $;
cards;
1 1 999,888
2 1 999,888
3 2 777
4 3 666
5 4 222,111
6 5 888
7 7 777,333
8 0 444,555
9 0 444,555
;
Run;
data temp;
set have;
do i=1 to countw(Person_IDs,',');
 k=scan(Person_IDs,i,',');output;
end;
keep k entity_ID;
run;
proc sql;
create table k as
select distinct k,entity_ID as _entity_ID from temp;
quit;
data want;
 if _n_=1 then do;
  if 0 then set k;
  declare hash h(dataset:'k',multidata:'y',hashexp:20);
  h.definekey('k');
  h.definedata('_entity_ID');
  h.definedone();
 end;
set have;
want=0;
do i=1 to countw(Person_IDs,',');
 k=scan(Person_IDs,i,',');
 rc=h.find();
 do while(rc=0);
   if entity_ID ne _entity_ID then do;want=1;leave;end;
   rc=h.find_next();
 end;
 if want then leave;
end;
drop i k rc _entity_ID;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 247 views
  • 0 likes
  • 3 in conversation