Identifying sibling structures in dataset

zfusfeld · Posted 12-05-2022 02:03 PM

I don't know if I am just struggling to identify a simple solution to this problem, but I have a dataset (in wide format) which has one record for unique participants within the dataset.

The dataset is wide across siblings, in the sense that each unique identifier has one or more siblings identified by their own unique identifier, and these siblings may or may not also themselves be in the dataset as a unique participant.

each sibling is identified in the dataset by their own unique id, and as shown in the picture below, any missing value indicates no sibling (past the last non-missing value, across 14 siblings (the max in the dataset)

You'll notice just from this incomplete screenshot of the dataset which is significantly larger, that some of the sibling unique identifiers (eg (sib01_uniqid, sib02_uniqid, etc) are also represented in the unique identifier field for the original observations (uniqid), as well as some that never appear as a simple unique ID in the dataset (e.g., they are known within the broader dataset, but not included in this specific subsample).

Is there an easy/simple way for me to identify those individuals who do have siblings among the original ID's, and create an indicator as such? I am sure there must be a relatively simple way, either in long or in wide format - I have both formats of this dataset available.

Any help is greatly appreciated!

HB · Posted 12-05-2022 05:45 PM

1. That is bad data structure.

2. Posting a working data step so people don't have to retype your data or make some data to work with greatly increases your chances of a useful response.

3. "identify those individuals who do have siblings among the original ID's, and create an indicator"

So an indicator (new column) that 4037 has siblings in the data. Doesn't matter whether 4044 is in or out, 4180 is in so 4037 gets an indicator?

4. In the bigger picture, what are you really trying to do?

zfusfeld · Posted 12-05-2022 06:23 PM

Hey, thought I actually deleted this thread!

It’s an intermediate data step in a complex target population identification process involving multiple different levels of observation within the family unit, but in any case it doesn’t matter, I realized where I was going wrong and figured it out with significantly more ease after I stepped away for 10 mins.

This was never an analytic dataset, this was primarily borne out of an intragenerational family map, but I realized I could do it with more ease by just doing it intergenerationally and centering on the parents as opposed to the children, even though they’re the eventual unit of observation for analysis. Anyway , thanks for your reply

zfusfeld · Posted 12-05-2022 06:24 PM

I’m an inexperienced user of SAS communities, so I appreciate the feedback.

Cheers

Ksharp · Posted 12-06-2022 06:39 AM

data have;
infile cards delimiter='|';
/*from is parent, to is child*/
input from $  to $ ;
cards;
        2    |    4  
        2    |    5  
        2    |    6  
        2    |    8
        4    |    7  
        4    |    11  
        6    |    9  
        6    |    10
        6    |    12
        10  |    15  
        10  |    16  
        13  |    14  
        16  |    17
          99|2
          7|4
          100|20
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node; output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 16);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

Identifying sibling structures in dataset

Re: Identifying sibling structures in dataset

Re: Identifying sibling structures in dataset

Re: Identifying sibling structures in dataset

Re: Identifying sibling structures in dataset

Catch up on SAS Innovate 2026

Identifying sibling structures in dataset

Re: Identifying sibling structures in dataset

Re: Identifying sibling structures in dataset

Re: Identifying sibling structures in dataset

Re: Identifying sibling structures in dataset

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away