DATA Step, Macro, Functions and more

Finding and grouping items related by many to many

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Finding and grouping items related by many to many


I have data that is related via a many to many relationship and I want to find all items that are related to each other and give them a unique id.

My example data is listed below:

PolicyId   PersonId

1          1000

1          1001

2          1001

3          1000

4          1002

2          1003

5          1003

4          1004

My desired output would either be

PolicyId   PersonId     GroupId

1          1000          1

1          1001          1

2          1001          1

3          1000          1

4          1002          2

2          1003          1

5          1003          1

4          1004          2

or

PolicyID     GroupId

1               1

2               1

3               1

4               2

5               1

PersonId     GroupId

1000          1

1001          1

1002          2

1003          1

1004          2

Do you have any suggestions on how to do this efficiently? 

I have managed to code this using a Macro, which for each policy, finds all related persons, then for each person, recursively calls the macro again for all related policies (that have not been assigned a GroupId already).  This works ok for a small number of values, but I have 1.9m observations in my dataset, so it performs very slowly.  There a some performance gains by calculating the cases where they is only one-to-many relationship between Policy and Person in a separate step, i.e. Policy 4 above would not be passed to the macro.  This means the number I'm checking reduces to about 174K. 


Accepted Solutions
Solution
‎08-18-2015 09:46 AM
Super User
Posts: 10,023

Re: Finding and grouping items related by many to many

OK. I will get these three tables for you . ( use my original code )

Code: Program

data have;
input from $  to $ ;
cards;
1 1000
1 1001
2 1001
3 1000
4 1002
2 1003
5 1003
4 1004
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
   h.definekey('node');
   h.definedata('node');
   h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node; output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 16);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();


do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
   key=to;ry=_ha.check();
   if ry ne 0 then do;
   node=to;output;rr=no.remove(key:node);
   key=to;_ha.add();
   count+1;
   last=to;ha.add();
   end;
   rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

data want1 ;
if _n_ eq 1 then do;
  if 0 then set want;
  declare hash ha(dataset:'want');
  ha.definekey('node');
  ha.definedata('household');
  ha.definedone();
end;
set have;
ha.find(key:from);
run;
proc sort data=want1(drop=to) out=want2 nodupkey;by from household;run;
proc sort data=want1(drop=from) out=want3 nodupkey;by to household;run;

Xia Keshan

View solution in original post


All Replies
Super Contributor
Posts: 345

Re: Finding and grouping items related by many to many

You should have posted your sas code.

The following code uses two hash objects:

data have;

input PolicyId   PersonId;

datalines;

1 1000

1 1001

2 1001

3 1000

4 1002

2 1003

5 1003

4 1004

;

run;

data _null_;

   set work.have end=last;

   length GroupId NextGroupId 8;

   retain NextGroupId 1;

   if _n_ = 1 then do;

      declare hash policies(ordered: 'yes');

      policies.defineKey('PolicyId');

      policies.defineData('PolicyId', 'GroupId');

      policies.defineDone();

      declare hash persons(ordered: 'yes');

      persons.defineKey('PersonId');

      persons.defineData('PersonId', 'GroupId');

      persons.defineDone();

   end;

   if policies.find() = 0 then do;

      /* PolicyID is already mapped to a Group */

      if persons.check() ^= 0 then do;

         /* but the personId is not mapped to the group */

         persons.add();

      end;

   end;

   else do;

      if persons.find() ^= 0 then do;

         /* no mappings exist */

         GroupId = NextGroupId;

         NextGroupId = NextGroupId + 1;

         persons.add();

         policies.add();

      end;

      else do;

         /* person-group mapping exists, but policy-group-mapping is missing */

         policies.add();

      end;

   end;

   if last then do;

      policies.output(dataset: 'work.PolicyGroupMapping');

      persons.output(dataset: 'work.PersonGroupMapping');

   end;

run;

Super User
Posts: 10,023

Re: Finding and grouping items related by many to many

Posted in reply to andreas_lds

Sorry, @andreas_lds . I am not tended to pick you up . Just curious your simple code can do such complicated task.

But your code would take a look forwards. See the following code , they all should be together.

data have;

input PolicyId   PersonId;

datalines;

0 1009

0 1005

1 1000

1 1001

1 1002

2 1001

3 1000

4 1002

2 1003

2 1004

5 1003

4 1005

;

run;

Super Contributor
Posts: 345

Re: Finding and grouping items related by many to many

You are right, my code does not provide the correct result for your example.

Occasional Contributor
Posts: 9

Re: Finding and grouping items related by many to many

Posted in reply to andreas_lds

Yes, you're correct that I should've posted my code, but I was trying to come up with an example that wasn't using our company specific column names.  I will try your code a test on my full dataset and let you know.  I will also try add my data to the this discussion, (if I can work out how to quickly change the values to mask, as I'm using company sensitive data and also size permitting.  In fact our compliance area might not allow me to post)

Occasional Contributor
Posts: 9

Re: Finding and grouping items related by many to many

Posted in reply to andreas_lds

This works beautifully.  My code went from taking 16 hours to run to 14 seconds!

Super User
Posts: 10,023

Re: Finding and grouping items related by many to many

Check

Code: Program

data have;
input from $  to $ ;
cards;
1 1000
1 1001
2 1001
3 1000
4 1002
2 1003
5 1003
4 1004
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
   h.definekey('node');
   h.definedata('node');
   h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node; output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;




data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp:20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node',hashexp:20);
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
/***********/
if 0 then set have;
declare hash k(dataset:'have');
k.definekey('from');
k.definedone();
/***********/

do while(hi_no.next()=0);
household+1; if k.check(key:node) ne 0 then output; /*********/
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
   key=to;ry=_ha.check();
   if ry ne 0 then do;
   node=to;if k.check(key:node) ne 0 then output; /*********/
   rr=no.remove(key:node);
   key=to;_ha.add();
   count+1;
   last=to;ha.add();
   end;
   rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Occasional Contributor
Posts: 9

Re: Finding and grouping items related by many to many

Xia code works for his example, but only produces one of the datasets I'm looking for.

Super User
Posts: 10,023

Re: Finding and grouping items related by many to many

Can't you use that table to get another two tables ? That is easy I think.

Occasional Contributor
Posts: 9

Re: Finding and grouping items related by many to many

Xia's code does not work if the 'from' has the same values as 'to'

e.g.


data have;

input from $ to $;

cards;

10 10

10 11

1  10

2  20

;

run;


Super User
Posts: 10,023

Re: Finding and grouping items related by many to many

Are you sure ? OP is asking three tables. I just gave him one of them.

Check

Finde out who belong to the same household

Solution
‎08-18-2015 09:46 AM
Super User
Posts: 10,023

Re: Finding and grouping items related by many to many

OK. I will get these three tables for you . ( use my original code )

Code: Program

data have;
input from $  to $ ;
cards;
1 1000
1 1001
2 1001
3 1000
4 1002
2 1003
5 1003
4 1004
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
   h.definekey('node');
   h.definedata('node');
   h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node; output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 16);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();


do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
   key=to;ry=_ha.check();
   if ry ne 0 then do;
   node=to;output;rr=no.remove(key:node);
   key=to;_ha.add();
   count+1;
   last=to;ha.add();
   end;
   rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

data want1 ;
if _n_ eq 1 then do;
  if 0 then set want;
  declare hash ha(dataset:'want');
  ha.definekey('node');
  ha.definedata('household');
  ha.definedone();
end;
set have;
ha.find(key:from);
run;
proc sort data=want1(drop=to) out=want2 nodupkey;by from household;run;
proc sort data=want1(drop=from) out=want3 nodupkey;by to household;run;

Xia Keshan

Occasional Contributor
Posts: 9

Re: Finding and grouping items related by many to many

This amended code works and a whole lot faster than the original answer given in comment 2.  On my data, using code from comment 2, it took 1 hour 20 minutes and was dropping records.  The amended code above took 9 minutes and all policies and persons were accounted for.

Occasional Contributor
Posts: 9

Re: Finding and grouping items related by many to many

The code still doesn't work if we have "from" values the same as "to" values, e.g.

 

data have;

input from $ to $ ;

cards;

1 0

1 1

2 1

3 0

4 2

2 3

5 3

4 4

;

 

These are all getting assigned the same household, instead of separate households.

Occasional Contributor
Posts: 9

Re: Finding and grouping items related by many to many

I find that if I prefix either the 'From' or the 'To' with a letter then code works

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 500 views
  • 0 likes
  • 3 in conversation