I have quite the quandary of an problem to solve via PROC SQL or SAS code. Here is a sample of the dataset I am working with. I need to group the networks here and assign a value/ID to the network itself (imagine another column, labeled "NETWORK ID"). Child ID Family ID 123 456 789 456 345 456 345 912 309 912 789 298 123 912 My task is to identify the "network" of relationships here... imagine the each child ID being a node on on node chart, defined and related via the Family ID (also a node, maybe a bigger node on the chart). Once this is identified, label this as "network X" and then move on to the next network; there could be many, many networks in the dataset, probably about 45k unique networks of many Family IDs and even more Child IDs. Roughly 500k rows, 2 fields (Child/Family ID). In the above case, you have: Child ID 123 related to Family ID 456, 912 Child ID 789 related to family ID 456, 298 Child ID 345 related to Family ID, 456, 912 Child ID 309 related to Family ID 912 Child ID 789 related to family ID 298 What makes this especially difficult, is there there could numerous layers to this, in case, there is only 2. But as we know, networks could span much bigger degrees. Meaning, CHILD 123 could be on 456, 912, 8944, 46333, 5334 and then all those Family IDs could also have children underneath them that are related to other families and then other Children and then other families... etc. Essentially, it could be endless, but in this case, it is not. It is defined and limited, but the limit is not known and there could many degrees. All degrees must be found though, all Children with every family must be found and mapped until the network is "broken" and no relationships existence anymore that have not been found. In other words, there are no more Children that have not been assigned to a network and all networks identified. I am totally lost on this myself and I need some guidance on how this can be determined. Let me know if you have any questions... My initial thoughts are using some sort of DO WHILE loop and store the values in a Macro of some sort, because you do not want the process to be recursive on itself and identify a FAMILY ID that has already been processed.
... View more