I found a lot of posts about assigning group IDs, but all of them seemed a little different from what I'm looking for.
I have a dataset that includes individuals and their family members - some of these family members are included in the same study (and have their own unique study ID and information about their relationship) as well as some information on some other family members not included in the study (and do not have a unique ID but have their relationship to those in the study). I'm wanting to assign a group ID for each family.
The have data looks like this - where relativeID indicates the relatives ID if they are included in the study, relativerelationship is their relationship to the ID case, and IDs within a family aren't necessarily sequential.
ID | RelativeID | RelativeRelationship |
10001 | . | Grandfather |
10001 | 10010 | Mother |
10001 | 12200 | Brother |
10010 | . | Father |
10010 | 10001 | Son |
10010 | 12200 | Son |
11100 | 33000 | Mother |
12200 | . | Grandfather |
12200 | 10001 | Brother |
12200 | 10010 | Mother |
33000 | 11100 | Son |
The want data would look like this. I'm wanting to assign an ID for each unique family group, including for those who do not have a relativeID:
ID | RelativeID | RelativeRelationship | FamilyID |
10001 | . | Grandfather | 111 |
10001 | 10010 | Mother | 111 |
10001 | 12200 | Brother | 111 |
10010 | . | Father | 111 |
10010 | 10001 | Son | 111 |
10010 | 12200 | Son | 111 |
11100 | 33000 | Mother | 222 |
12200 | . | Grandfather | 111 |
12200 | 10001 | Brother | 111 |
12200 | 10010 | Mother | 111 |
33000 | 11100 | Son | 222 |
Maybe I'm overthinking things, but it seems like the family members that don't have their own unique RelativeID kind of complicates things. I'm struggling to think of an efficient way to do this. My SQL coding is pretty rusty, but my brain is saying this is probably an issue best solved with SQL, but I could be wrong.
Any insight on this would be appreciated.
DATA step with hash object makes this easy:
data have;
infile datalines;
input ID RelativeID RelativeRelationship :$15.;
datalines;
10001 . Grandfather
10001 10010 Mother
10001 12200 Brother
10010 . Father
10010 10001 Son
10010 12200 Son
11100 33000 Mother
12200 . Grandfather
12200 10001 Brother
12200 10010 Mother
33000 11100 Son
;
data want;
set have;
retain r_family 0;
if _n_ = 1
then do;
length family_id 8;
declare hash fam ();
fam.definekey("id");
fam.definedata("family_id");
fam.definedone();
end;
if fam.find() ne 0
then do;
r_family + 1;
family_id = r_family;
rc = fam.add();
end;
if relativeid ne . and fam.check(key:relativeid) ne 0
then rc = fam.add(key:relativeid,data:family_id);
drop r_family;
run;
You only need to tweak the code for creating a new family_id.
It may be that this code runs out of memory if it has to deal with a VERY large dataset; each distinct id should consume 48 bytes of memory.
DATA step with hash object makes this easy:
data have;
infile datalines;
input ID RelativeID RelativeRelationship :$15.;
datalines;
10001 . Grandfather
10001 10010 Mother
10001 12200 Brother
10010 . Father
10010 10001 Son
10010 12200 Son
11100 33000 Mother
12200 . Grandfather
12200 10001 Brother
12200 10010 Mother
33000 11100 Son
;
data want;
set have;
retain r_family 0;
if _n_ = 1
then do;
length family_id 8;
declare hash fam ();
fam.definekey("id");
fam.definedata("family_id");
fam.definedone();
end;
if fam.find() ne 0
then do;
r_family + 1;
family_id = r_family;
rc = fam.add();
end;
if relativeid ne . and fam.check(key:relativeid) ne 0
then rc = fam.add(key:relativeid,data:family_id);
drop r_family;
run;
You only need to tweak the code for creating a new family_id.
It may be that this code runs out of memory if it has to deal with a VERY large dataset; each distinct id should consume 48 bytes of memory.
This is exactly what I was looking for!
I've put off learning about hash objects for years, but this is really cool. Looks like it's time for me to learn more about them.
Thanks for the help!
data x;
infile datalines;
input ID RelativeID RelativeRelationship :$15.;
datalines;
10001 . Grandfather
10001 10010 Mother
10001 12200 Brother
10010 . Father
10010 10001 Son
10010 12200 Son
11100 33000 Mother
12200 . Grandfather
12200 10001 Brother
12200 10010 Mother
33000 11100 Son
;
data have;
set x(rename=(ID=from RelativeID=to));
keep from to;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
data final_want;
if _n_=1 then do;
if 0 then set want(rename=(household=FamilyID));
declare hash h(dataset:'want(rename=(household=FamilyID))');
h.definekey('node');
h.definedata('FamilyID');
h.definedone();
end;
set x;
call missing(FamilyID);
rc=h.find(key:id);
drop rc node;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.