DATA Step, Macro, Functions and more

Creating family IDs

Reply
New Contributor
Posts: 3

Creating family IDs

Hi

I'm doing research on some data which contains information on children and their parents. I would like to create an ID number for each family/entity. The tricky part (or at least for me) is that children should be given the same ID number if they are somehow linked through their parents. As an example the six children below (A1-A6) should all be given the same ID number, as (1) the first four children (A1-A4) have the same father, (2) A5 is a halfsibling to A4 (they have the same mother) and (3) A6 is a halfsibling to A5 (the have the same father)  

 

Child ID, Mother ID, Father ID

A1, Y1, X1

A2, Y1, X1

A3, Y2, X1 

A4, Y3, X1

A5, Y3, X2

A6, Y4, X2

 

I hope someone can help me figure this out. Thanks!

 

PROC Star
Posts: 266

Re: Creating family IDs

Posted in reply to CharlotteH

I think you will have to figure out your requirements correctly first.

Take data like this:

data have;
  input Child_ID $ Mother_ID $ Father_ID $;
cards;
A1 Y1 X1
A2 Y1 X1
A3 Y2 X1 
A4 Y3 X1
A5 Y3 X2
A6 Y4 X2
A7 Y5 X2
A8 Y5 X3
;run;

According to your rule, A5 and A7 should have the same family ID, because they have the same father. And A7 and A8 should have the same family ID because they have the same mother. So A5 and A8 will end up with the same family ID, even though they have no parents in common. Is this really what you want?

New Contributor
Posts: 3

Re: Creating family IDs

Exactely. I know it sounds a bit strange, but for the purpose I'm working on at the moment that's what I want.   

Super User
Posts: 10,784

Re: Creating family IDs

Posted in reply to CharlotteH

OK. How about this one ?

 


data have;
infile cards ;
input from $  to $ ;
cards;
 Y1 A1
 X1 A1
Y1 A2 
X1 A2 
Y2 A3
X1 A3
Y3 A4
X1 A4
Y3 A5
X2 A5
Y4 A6 
X2 A6
;
run;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Super Contributor
Posts: 326

Re: Creating family IDs

Posted in reply to CharlotteH

Here is shorter hash version:

 

data have;
  input Child_ID $ Mother_ID $ Father_ID $;
cards;
A1 Y1 X1
A2 Y1 X1
A3 Y2 X1 
A4 Y3 X1
A5 Y3 X2
A6 Y4 X2
A7 Y5 X2
A8 Y5 X3
A9 Y6 X4
;
run;

data want;
   if _n_ = 1 then do;
      cid = 0;
      if 0 then set have;
      declare hash hm();
      hm.definekey('Mother_ID');
      hm.definedone();
      declare hash hf();
      hf.definekey('Father_ID');
      hf.definedone();
   end;
   set have;
   rm = hm.find();
   rf = hf.find();
   if rm ^= 0 then hm.add();
   if rf ^= 0 then hf.add();
   if (rm ^= 0 & rf ^= 0) then cid+1;
drop rm rf;
run;
proc print data = want;
run;
Super User
Posts: 13,574

Re: Creating family IDs

Posted in reply to CharlotteH

Are you adding children to your current data set? I ask because consider this case:

 

You start with

A1, Y1, X1

A2, Y1, X1

 

A5, Y3, X2

A6, Y3, X2

(two simple family structures and you assign 2 ids)

later you add this child:

 

A10, Y1,X2

Which existing family would this go to? Reassigning existing ID would very likely be a very poor process. But your "rule" says A10 is associated with both of the existing "families".

 

And you don't mention the ages of the children involved so what about:

 

A230, A5,X10

where your "children" are also parents?

 

While there many reasons to have single variables for simple code there are times when they can complicate other logic.

 

Note that there are some procedures actually deal with pairs such as mother/fathe. Proc Inbreed is one.

Ask a Question
Discussion stats
  • 5 replies
  • 144 views
  • 1 like
  • 5 in conversation