BookmarkSubscribeRSS Feed
zfusfeld
Fluorite | Level 6

I don't know if I am just struggling to identify a simple solution to this problem, but I have a dataset (in wide format) which has one record for unique participants within the dataset.

 

The dataset is wide across siblings, in the sense that each unique identifier has one or more siblings identified by their own unique identifier, and these siblings may or may not also themselves be in the dataset as a unique participant.

 

each sibling is identified in the dataset by their own unique id, and as shown in the picture below, any missing value indicates no sibling (past the last non-missing value, across 14 siblings (the max in the dataset)

example dataset.PNG

You'll notice just from this incomplete screenshot of the dataset which is significantly larger, that some of the sibling unique identifiers (eg (sib01_uniqid, sib02_uniqid, etc) are also represented in the unique identifier field for the original observations (uniqid), as well as some that never appear as a simple unique ID in the dataset (e.g., they are known within the broader dataset, but not included in this specific subsample).

 

Is there an easy/simple way for me to identify those individuals who do have siblings among the original ID's, and create an indicator as such? I am sure there must be a relatively simple way, either in long or in wide format - I have both formats of this dataset available.

 

Any help is greatly appreciated!

4 REPLIES 4
HB
Barite | Level 11 HB
Barite | Level 11

 

1. That is bad data structure.

 

2.  Posting  a working data step so people don't have to retype your data or make some data to work with greatly increases your chances of a useful response.    

 

3. "identify those individuals who do have siblings among the original ID's, and create an indicator"

So an indicator (new column) that 4037 has siblings in the data.  Doesn't matter whether 4044 is in or out, 4180 is in so 4037 gets an indicator?

 

4.  In the bigger picture, what are you really trying to do?

 

 

zfusfeld
Fluorite | Level 6
Hey, thought I actually deleted this thread!

It’s an intermediate data step in a complex target population identification process involving multiple different levels of observation within the family unit, but in any case it doesn’t matter, I realized where I was going wrong and figured it out with significantly more ease after I stepped away for 10 mins.

This was never an analytic dataset, this was primarily borne out of an intragenerational family map, but I realized I could do it with more ease by just doing it intergenerationally and centering on the parents as opposed to the children, even though they’re the eventual unit of observation for analysis. Anyway , thanks for your reply
zfusfeld
Fluorite | Level 6
I’m an inexperienced user of SAS communities, so I appreciate the feedback.

Cheers
Ksharp
Super User
data have;
infile cards delimiter='|';
/*from is parent, to is child*/
input from $  to $ ;
cards;
        2    |    4  
        2    |    5  
        2    |    6  
        2    |    8
        4    |    7  
        4    |    11  
        6    |    9  
        6    |    10
        6    |    12
        10  |    15  
        10  |    16  
        13  |    14  
        16  |    17
          99|2
          7|4
          100|20
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node; output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 16);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

    
 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 757 views
  • 1 like
  • 3 in conversation