Discerning Many:Many Relationships: Node Chart

BenCBanks · Posted 06-17-2019 05:58 PM

I have quite the quandary of an problem to solve via PROC SQL or SAS code. Here is a sample of the dataset I am working with. I need to group the networks here and assign a value/ID to the network itself (imagine another column, labeled "NETWORK ID").

Child ID Family ID

123	456
789	456
345	456
345	912
309	912
789	298
123	912

My task is to identify the "network" of relationships here... imagine the each child ID being a node on on node chart, defined and related via the Family ID (also a node, maybe a bigger node on the chart). Once this is identified, label this as "network X" and then move on to the next network; there could be many, many networks in the dataset, probably about 45k unique networks of many Family IDs and even more Child IDs. Roughly 500k rows, 2 fields (Child/Family ID).

In the above case, you have:

Child ID 123 related to Family ID 456, 912

Child ID 789 related to family ID 456, 298

Child ID 345 related to Family ID, 456, 912

Child ID 309 related to Family ID 912

Child ID 789 related to family ID 298

What makes this especially difficult, is there there could numerous layers to this, in case, there is only 2. But as we know, networks could span much bigger degrees. Meaning, CHILD 123 could be on 456, 912, 8944, 46333, 5334 and then all those Family IDs could also have children underneath them that are related to other families and then other Children and then other families... etc.

Essentially, it could be endless, but in this case, it is not. It is defined and limited, but the limit is not known and there could many degrees. All degrees must be found though, all Children with every family must be found and mapped until the network is "broken" and no relationships existence anymore that have not been found. In other words, there are no more Children that have not been assigned to a network and all networks identified.

I am totally lost on this myself and I need some guidance on how this can be determined. Let me know if you have any questions...

My initial thoughts are using some sort of DO WHILE loop and store the values in a Macro of some sort, because you do not want the process to be recursive on itself and identify a FAMILY ID that has already been processed.

Reeza · Posted 06-17-2019 06:01 PM

PROC OPTNET or the macro here may be what you need, especially if you don't have the SAS/OR license. PROC BOM may also be an option.

https://communities.sas.com/t5/SAS-Communities-Library/How-to-find-all-connected-components-in-a-gra...

@BenCBanks wrote:

I have quite the quandary of an problem to solve via PROC SQL or SAS code. Here is a sample of the dataset I am working with. I need to group the networks here and assign a value/ID to the network itself (imagine another column, labeled "NETWORK ID").

Child ID Family ID

123 456

789 456

345 456

345 912

309 912

789 298

123 912

My task is to identify the "network" of relationships here... imagine the each child ID being a node on on node chart, defined and related via the Family ID (also a node, maybe a bigger node on the chart). Once this is identified, label this as "network X" and then move on to the next network; there could be many, many networks in the dataset, probably about 45k unique networks of many Family IDs and even more Child IDs. Roughly 500k rows, 2 fields (Child/Family ID).

In the above case, you have:

Child ID 123 related to Family ID 456, 912

Child ID 789 related to family ID 456, 298

Child ID 345 related to Family ID, 456, 912

Child ID 309 related to Family ID 912

Child ID 789 related to family ID 298

What makes this especially difficult, is there there could numerous layers to this, in case, there is only 2. But as we know, networks could span much bigger degrees. Meaning, CHILD 123 could be on 456, 912, 8944, 46333, 5334 and then all those Family IDs could also have children underneath them that are related to other families and then other Children and then other families... etc.

Essentially, it could be endless, but in this case, it is not. It is defined and limited, but the limit is not known and there could many degrees. All degrees must be found though, all Children with every family must be found and mapped until the network is "broken" and no relationships existence anymore that have not been found. In other words, there are no more Children that have not been assigned to a network and all networks identified.

I am totally lost on this myself and I need some guidance on how this can be determined. Let me know if you have any questions...

My initial thoughts are using some sort of DO WHILE loop and store the values in a Macro of some sort, because you do not want the process to be recursive on itself and identify a FAMILY ID that has already been processed.

BenCBanks · Posted 06-17-2019 06:07 PM

thanks for the reply. I am 99% sure I do not have that license. I will see about that macro.

Ksharp · Posted 06-18-2019 09:06 AM


data have(rename=(FamilyID=from ChildID=to));
infile cards ;
input FamilyID $ ChildID $ ;
cards;
123	456
789	456
345	456
345	912
309	912
789	298
123	912
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

Discerning Many:Many Relationships: Node Chart

Re: Discerning Many:Many Relationships: Node Chart

Re: Discerning Many:Many Relationships: Node Chart

Re: Discerning Many:Many Relationships: Node Chart

Discerning Many:Many Relationships: Node Chart

Re: Discerning Many:Many Relationships: Node Chart

Re: Discerning Many:Many Relationships: Node Chart

Re: Discerning Many:Many Relationships: Node Chart

Ready to join fellow brilliant minds for the SAS Hackathon?