I don't know if I am just struggling to identify a simple solution to this problem, but I have a dataset (in wide format) which has one record for unique participants within the dataset.
The dataset is wide across siblings, in the sense that each unique identifier has one or more siblings identified by their own unique identifier, and these siblings may or may not also themselves be in the dataset as a unique participant.
each sibling is identified in the dataset by their own unique id, and as shown in the picture below, any missing value indicates no sibling (past the last non-missing value, across 14 siblings (the max in the dataset)
You'll notice just from this incomplete screenshot of the dataset which is significantly larger, that some of the sibling unique identifiers (eg (sib01_uniqid, sib02_uniqid, etc) are also represented in the unique identifier field for the original observations (uniqid), as well as some that never appear as a simple unique ID in the dataset (e.g., they are known within the broader dataset, but not included in this specific subsample).
Is there an easy/simple way for me to identify those individuals who do have siblings among the original ID's, and create an indicator as such? I am sure there must be a relatively simple way, either in long or in wide format - I have both formats of this dataset available.
Any help is greatly appreciated!
1. That is bad data structure.
2. Posting a working data step so people don't have to retype your data or make some data to work with greatly increases your chances of a useful response.
3. "identify those individuals who do have siblings among the original ID's, and create an indicator"
So an indicator (new column) that 4037 has siblings in the data. Doesn't matter whether 4044 is in or out, 4180 is in so 4037 gets an indicator?
4. In the bigger picture, what are you really trying to do?
data have;
infile cards delimiter='|';
/*from is parent, to is child*/
input from $ to $ ;
cards;
2 | 4
2 | 5
2 | 6
2 | 8
4 | 7
4 | 11
6 | 9
6 | 10
6 | 12
10 | 15
10 | 16
13 | 14
16 | 17
99|2
7|4
100|20
;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node; output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 16);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.