DATA Step, Macro, Functions and more

Parent- Child hierarchy

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Parent- Child hierarchy

Hello,

 

I have 2 variables in a dataset viz child code and parent code. I want to filter for all the childrens within a parent code.

Eg:

Child codeMother code
21
32
42
52
63
76
87

 

From above sample dataset, If i select 2 I want all the childrens and grand childrens under 2 ie 3, 4, 5, 6, 7, 8 and if i select 3 as a parent code then 6, 7, 8 etc..

 

Kindly help..

 

Thanks,

Ravi


Accepted Solutions
Solution
‎08-23-2016 09:51 PM
Respected Advisor
Posts: 4,660

Re: Parent- Child hierarchy

Here is a simple solution using an index.

 

data have(index=(child / unique));
input Child Mother;
datalines;
2	1
3	2
4	2
5	2
6	3
7	6
8	7
;

data want;
set have;
ancestor = mother;
descendent = child;
output;
do i = 1 to 100 until(_error_); /* catch infinite loops */
    child = mother;
    set have key=child / unique;
    if not _error_ then do;
        ancestor = mother;
        output;
        end;
    end;
_error_ = 0;
keep ancestor descendent;
run;

proc sort data=want; by ancestor descendent; run;

proc print; run;
PG

View solution in original post


All Replies
Super User
Posts: 17,948

Re: Parent- Child hierarchy

How are you specifying the 2/3? Can you specify multiples ie 2 & 3?

 

What do you want the output to look like?

Occasional Contributor
Posts: 6

Re: Parent- Child hierarchy

[ Edited ]

Thanks for your reply Reeza.

 

Yes, there should be an ability to specify 2 and 3.

 

Output will look like:

Mother codeChild code
23
24
25
26
27
36
37
38
78

 

Hope this helps.

 

Thanks,

Ravi

Respected Advisor
Posts: 4,660

Re: Parent- Child hierarchy

Is there a limit to the number of generations between mother and descendant that you want? 

PG
Occasional Contributor
Posts: 6

Re: Parent- Child hierarchy

Thanks for your reply PG Stats.

 

There is no set limit as such for the generation of rows. The generations will be until the last child code is available in a relationship.

 

Thanks,

Ravi Megharaj

Super User
Posts: 17,948

Re: Parent- Child hierarchy

This is a traverse the graph problem. 

 

One way to accomplish this is to identify all clusters and then you can filter what you need from final table. 

 

This code is @PGStats solution  for this type of problem. If it's not, he'll correct me Woman Wink

 

https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30

Respected Advisor
Posts: 4,660

Re: Parent- Child hierarchy

Unfortunately, the subGraph macro cannot solve this problem because it is meant for undirected graphs. But as we all know the parent-child relationship is directed Man Happy. The subgraph macro would thus link a person to all members of his/her family.

PG
Solution
‎08-23-2016 09:51 PM
Respected Advisor
Posts: 4,660

Re: Parent- Child hierarchy

Here is a simple solution using an index.

 

data have(index=(child / unique));
input Child Mother;
datalines;
2	1
3	2
4	2
5	2
6	3
7	6
8	7
;

data want;
set have;
ancestor = mother;
descendent = child;
output;
do i = 1 to 100 until(_error_); /* catch infinite loops */
    child = mother;
    set have key=child / unique;
    if not _error_ then do;
        ancestor = mother;
        output;
        end;
    end;
_error_ = 0;
keep ancestor descendent;
run;

proc sort data=want; by ancestor descendent; run;

proc print; run;
PG
Super User
Posts: 9,691

Re: Parent- Child hierarchy

PG,
What if there is dead loop ? Like:

2 1
3 2
1 3

Respected Advisor
Posts: 4,660

Re: Parent- Child hierarchy

Looping will stop when i reaches 100. A check could (should) be added for this condition

 

if i > 99 then ERROR "Invalid data involving descendant " descendent;

else _ERROR_ = 0;

 

(untested)

PG
Super User
Posts: 9,691

Re: Parent- Child hierarchy

PG,

What if there are actually more than 100 strat/subject , i.e. 120 different subject ?
Respected Advisor
Posts: 4,660

Re: Parent- Child hierarchy

The loop travels the tree from child to its most ancient ancestor. So you would need data about more than 100 generations (not subjects) to reach the limit. That limit could be increased of course, but 100 seemed reasonable to me.

PG
Occasional Contributor
Posts: 6

Re: Parent- Child hierarchy

Thanks a lot PGStats. Your solution helps and resolves my issue.

Regular Contributor
Posts: 242

Re: Parent- Child hierarchy

I tried to implement FIFO (first in first out stack) concept to find all the child. User need to provide mother id in macro variable

search_val

 

The final desired result would come in work.output_list.

 

 

%let search_val=2;

data input;
input C_cd	M_cd;
datalines;
2 1
3 2
4 2
5 2
6 3
7 6
8 7
;



proc sql;
create table  parent_child as
select a.c_cd as ID,
a.M_cd as mother, b.C_cd as child,
put(a.M_cd,8.) || ", "||put(a.c_cd,8.)|| ", "||put(b.C_cd ,8.)as list
from input a left join input b
on a.C_cd = b.M_cd;
quit;
%let cnt_val =1;


proc sql;
drop table fifo_list ;
drop table output_list ;
drop table   new_list;
quit;

data fifo_list;
child=&search_val;
run;


%macro fifo;

%start: %put &cnt_val ;
data _null_;
set fifo_list   (obs=1);
call symputx('val',child);
run;


proc sql;
delete from fifo_list   where child=&val;

create table new_list as
select child from parent_child 
where ID=&val and child <> . ;
quit;

proc append base=fifo_list   data=new_list;run;

proc append base=output_list data=new_list;run;

proc sql;
select count(*) into :cnt_val 
from fifo_list   where child <> . ;  ;
quit;
%if &cnt_val > 0 %then %goto start;

%mend;

%fifo;

 

Super User
Posts: 9,691

Re: Parent- Child hierarchy

Here is mine. Check LOG after running code.



data have;
input _end  _start ;
cards;
2 1
3 2
4 2
5 2
6 3
7 6
8 7
;
run;
 
%let start=2;
data want(keep=path);
if _n_ eq 1 then do;
length path _path  $ 700 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;


set have(where=(_start=&start));
count=1;n=1;_n=1;
path=catx('|',_start,_end);
putlog 'WARNING:Found  ' _end;
   
pa.add();output;
do while(hi_path.next()=0);
 if n ne 1 then pa.remove(key:_n);_n=n;
 _path=path;   
 _start=scan(path,-1,'|');
 rc=ha.find();
 do while(rc=0);
  if not findw(path,strip(_end),'|') then do;
   if length(path)+length(_end)+1 gt lengthc(path) then do;
    putlog 'ERROR: The length of path and _path are set too short';
    stop;
   end;
   
   putlog 'WARNING:Found  ' _end;
   count+1;n=count;
   path=catx('|',path,_end);
   pa.add(); output;
   path=_path;
 end;
  rc=ha.find_next();
end;
end;
pa.clear();

run;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 941 views
  • 6 likes
  • 5 in conversation