BookmarkSubscribeRSS Feed
BenCBanks
Fluorite | Level 6

I have quite the quandary of an problem to solve via PROC SQL or SAS code. Here is a sample of the dataset I am working with. I need to group the networks here and assign a value/ID to the network itself (imagine another column, labeled "NETWORK ID"). 

 

Child ID Family ID

123456
789456
345456
345912
309912
789298
123912

 

 

My task is to identify the "network" of relationships here... imagine the each child ID being a node on on node chart, defined and related via the Family ID (also a node, maybe a bigger node on the chart). Once this is identified, label this as "network X" and then move on to the next network; there could be many, many networks in the dataset, probably about 45k unique networks of many Family IDs and even more Child IDs. Roughly 500k rows, 2 fields (Child/Family ID). 

 

In the above case, you have:

Child ID 123 related to Family ID 456, 912

Child ID 789 related to family ID 456, 298

Child ID 345 related to Family ID, 456, 912

Child ID 309 related to Family ID 912

Child ID 789 related to family ID 298

 

What makes this especially difficult, is there there could numerous layers to this, in case, there is only 2. But as we know, networks could span much bigger degrees. Meaning, CHILD 123 could be on 456, 912, 8944, 46333, 5334 and then all those Family IDs could also have children underneath them that are related to other families and then other Children and then other families... etc.

 

Essentially, it could be endless, but in this case, it is not. It is defined and limited, but the limit is not known and there could many degrees. All degrees must be found though, all Children with every family must be found and mapped until the network is "broken" and no relationships existence anymore that have not been found. In other words, there are no more Children that have not been assigned to a network and all networks identified. 

 

I am totally lost on this myself and I need some guidance on how this can be determined. Let me know if you have any questions...

 

My initial thoughts are using some sort of DO WHILE loop and store the values in a Macro of some sort, because you do not want the process to be recursive on itself and identify a FAMILY ID that has already been processed. 

3 REPLIES 3
Reeza
Super User

PROC OPTNET or the macro here may be what you need, especially if you don't have the SAS/OR license. PROC BOM may also be an option. 

https://communities.sas.com/t5/SAS-Communities-Library/How-to-find-all-connected-components-in-a-gra...

 


@BenCBanks wrote:

I have quite the quandary of an problem to solve via PROC SQL or SAS code. Here is a sample of the dataset I am working with. I need to group the networks here and assign a value/ID to the network itself (imagine another column, labeled "NETWORK ID"). 

 

Child ID Family ID

123 456
789 456
345 456
345 912
309 912
789 298
123 912

 

 

My task is to identify the "network" of relationships here... imagine the each child ID being a node on on node chart, defined and related via the Family ID (also a node, maybe a bigger node on the chart). Once this is identified, label this as "network X" and then move on to the next network; there could be many, many networks in the dataset, probably about 45k unique networks of many Family IDs and even more Child IDs. Roughly 500k rows, 2 fields (Child/Family ID). 

 

In the above case, you have:

Child ID 123 related to Family ID 456, 912

Child ID 789 related to family ID 456, 298

Child ID 345 related to Family ID, 456, 912

Child ID 309 related to Family ID 912

Child ID 789 related to family ID 298

 

What makes this especially difficult, is there there could numerous layers to this, in case, there is only 2. But as we know, networks could span much bigger degrees. Meaning, CHILD 123 could be on 456, 912, 8944, 46333, 5334 and then all those Family IDs could also have children underneath them that are related to other families and then other Children and then other families... etc.

 

Essentially, it could be endless, but in this case, it is not. It is defined and limited, but the limit is not known and there could many degrees. All degrees must be found though, all Children with every family must be found and mapped until the network is "broken" and no relationships existence anymore that have not been found. In other words, there are no more Children that have not been assigned to a network and all networks identified. 

 

I am totally lost on this myself and I need some guidance on how this can be determined. Let me know if you have any questions...

 

My initial thoughts are using some sort of DO WHILE loop and store the values in a Macro of some sort, because you do not want the process to be recursive on itself and identify a FAMILY ID that has already been processed. 


 

BenCBanks
Fluorite | Level 6

thanks for the reply. I am 99% sure I do not have that license. I will see about that macro.

Ksharp
Super User

data have(rename=(FamilyID=from ChildID=to));
infile cards ;
input FamilyID $ ChildID $ ;
cards;
123	456
789	456
345	456
345	912
309	912
789	298
123	912
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 590 views
  • 0 likes
  • 3 in conversation