BookmarkSubscribeRSS Feed
CharlotteH
Calcite | Level 5

Hi

I'm doing research on some data which contains information on children and their parents. I would like to create an ID number for each family/entity. The tricky part (or at least for me) is that children should be given the same ID number if they are somehow linked through their parents. As an example the six children below (A1-A6) should all be given the same ID number, as (1) the first four children (A1-A4) have the same father, (2) A5 is a halfsibling to A4 (they have the same mother) and (3) A6 is a halfsibling to A5 (the have the same father)  

 

Child ID, Mother ID, Father ID

A1, Y1, X1

A2, Y1, X1

A3, Y2, X1 

A4, Y3, X1

A5, Y3, X2

A6, Y4, X2

 

I hope someone can help me figure this out. Thanks!

 

5 REPLIES 5
s_lassen
Meteorite | Level 14

I think you will have to figure out your requirements correctly first.

Take data like this:

data have;
  input Child_ID $ Mother_ID $ Father_ID $;
cards;
A1 Y1 X1
A2 Y1 X1
A3 Y2 X1 
A4 Y3 X1
A5 Y3 X2
A6 Y4 X2
A7 Y5 X2
A8 Y5 X3
;run;

According to your rule, A5 and A7 should have the same family ID, because they have the same father. And A7 and A8 should have the same family ID because they have the same mother. So A5 and A8 will end up with the same family ID, even though they have no parents in common. Is this really what you want?

CharlotteH
Calcite | Level 5

Exactely. I know it sounds a bit strange, but for the purpose I'm working on at the moment that's what I want.   

Ksharp
Super User

OK. How about this one ?

 


data have;
infile cards ;
input from $  to $ ;
cards;
 Y1 A1
 X1 A1
Y1 A2 
X1 A2 
Y2 A3
X1 A3
Y3 A4
X1 A4
Y3 A5
X2 A5
Y4 A6 
X2 A6
;
run;

data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
KachiM
Rhodochrosite | Level 12

Here is shorter hash version:

 

data have;
  input Child_ID $ Mother_ID $ Father_ID $;
cards;
A1 Y1 X1
A2 Y1 X1
A3 Y2 X1 
A4 Y3 X1
A5 Y3 X2
A6 Y4 X2
A7 Y5 X2
A8 Y5 X3
A9 Y6 X4
;
run;

data want;
   if _n_ = 1 then do;
      cid = 0;
      if 0 then set have;
      declare hash hm();
      hm.definekey('Mother_ID');
      hm.definedone();
      declare hash hf();
      hf.definekey('Father_ID');
      hf.definedone();
   end;
   set have;
   rm = hm.find();
   rf = hf.find();
   if rm ^= 0 then hm.add();
   if rf ^= 0 then hf.add();
   if (rm ^= 0 & rf ^= 0) then cid+1;
drop rm rf;
run;
proc print data = want;
run;
ballardw
Super User

Are you adding children to your current data set? I ask because consider this case:

 

You start with

A1, Y1, X1

A2, Y1, X1

 

A5, Y3, X2

A6, Y3, X2

(two simple family structures and you assign 2 ids)

later you add this child:

 

A10, Y1,X2

Which existing family would this go to? Reassigning existing ID would very likely be a very poor process. But your "rule" says A10 is associated with both of the existing "families".

 

And you don't mention the ages of the children involved so what about:

 

A230, A5,X10

where your "children" are also parents?

 

While there many reasons to have single variables for simple code there are times when they can complicate other logic.

 

Note that there are some procedures actually deal with pairs such as mother/fathe. Proc Inbreed is one.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1576 views
  • 1 like
  • 5 in conversation