02-01-2018 02:21 AM
I'm doing research on some data which contains information on children and their parents. I would like to create an ID number for each family/entity. The tricky part (or at least for me) is that children should be given the same ID number if they are somehow linked through their parents. As an example the six children below (A1-A6) should all be given the same ID number, as (1) the first four children (A1-A4) have the same father, (2) A5 is a halfsibling to A4 (they have the same mother) and (3) A6 is a halfsibling to A5 (the have the same father)
Child ID, Mother ID, Father ID
A1, Y1, X1
A2, Y1, X1
A3, Y2, X1
A4, Y3, X1
A5, Y3, X2
A6, Y4, X2
I hope someone can help me figure this out. Thanks!
02-01-2018 03:42 AM
I think you will have to figure out your requirements correctly first.
Take data like this:
data have; input Child_ID $ Mother_ID $ Father_ID $; cards; A1 Y1 X1 A2 Y1 X1 A3 Y2 X1 A4 Y3 X1 A5 Y3 X2 A6 Y4 X2 A7 Y5 X2 A8 Y5 X3 ;run;
According to your rule, A5 and A7 should have the same family ID, because they have the same father. And A7 and A8 should have the same family ID because they have the same mother. So A5 and A8 will end up with the same family ID, even though they have no parents in common. Is this really what you want?
02-01-2018 07:51 AM
OK. How about this one ?
data have; infile cards ; input from $ to $ ; cards; Y1 A1 X1 A1 Y1 A2 X1 A2 Y2 A3 X1 A3 Y3 A4 X1 A4 Y3 A5 X2 A5 Y4 A6 X2 A6 ; run; data full; set have end=last; if _n_ eq 1 then do; declare hash h(); h.definekey('node'); h.definedata('node'); h.definedone(); end; output; node=from; h.replace(); from=to; to=node; output; node=from; h.replace(); if last then h.output(dataset:'node'); drop node; run; data want(keep=node household); declare hash ha(ordered:'a'); declare hiter hi('ha'); ha.definekey('count'); ha.definedata('last'); ha.definedone(); declare hash _ha(hashexp: 20); _ha.definekey('key'); _ha.definedone(); if 0 then set full; declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y'); from_to.definekey('from'); from_to.definedata('to'); from_to.definedone(); if 0 then set node; declare hash no(dataset:'node'); declare hiter hi_no('no'); no.definekey('node'); no.definedata('node'); no.definedone(); do while(hi_no.next()=0); household+1; output; count=1; key=node;_ha.add(); last=node;ha.add(); rc=hi.first(); do while(rc=0); from=last;rx=from_to.find(); do while(rx=0); key=to;ry=_ha.check(); if ry ne 0 then do; node=to;output;rr=no.remove(key:node); key=to;_ha.add(); count+1; last=to;ha.add(); end; rx=from_to.find_next(); end; rc=hi.next(); end; ha.clear();_ha.clear(); end; stop; run;
02-01-2018 08:53 AM
Here is shorter hash version:
data have; input Child_ID $ Mother_ID $ Father_ID $; cards; A1 Y1 X1 A2 Y1 X1 A3 Y2 X1 A4 Y3 X1 A5 Y3 X2 A6 Y4 X2 A7 Y5 X2 A8 Y5 X3 A9 Y6 X4 ; run; data want; if _n_ = 1 then do; cid = 0; if 0 then set have; declare hash hm(); hm.definekey('Mother_ID'); hm.definedone(); declare hash hf(); hf.definekey('Father_ID'); hf.definedone(); end; set have; rm = hm.find(); rf = hf.find(); if rm ^= 0 then hm.add(); if rf ^= 0 then hf.add(); if (rm ^= 0 & rf ^= 0) then cid+1; drop rm rf; run; proc print data = want; run;
02-01-2018 11:03 AM
Are you adding children to your current data set? I ask because consider this case:
You start with
A1, Y1, X1
A2, Y1, X1
A5, Y3, X2
A6, Y3, X2
(two simple family structures and you assign 2 ids)
later you add this child:
Which existing family would this go to? Reassigning existing ID would very likely be a very poor process. But your "rule" says A10 is associated with both of the existing "families".
And you don't mention the ages of the children involved so what about:
where your "children" are also parents?
While there many reasons to have single variables for simple code there are times when they can complicate other logic.
Note that there are some procedures actually deal with pairs such as mother/fathe. Proc Inbreed is one.