DATA Step, Macro, Functions and more

Finding depth and last child

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Finding depth and last child

[ Edited ]

I have below dataset 'test' which has two columns name and parent:

 

Explanation=> a has no parent, a is parent of b, b is parent of c, c is parent of d, e has no parent, e is parent of f, f is parent of g,

so want to found the depth of 'a' and 'e':

i have given the output for your reference:

 

data test;

input name $ parent $;

datalines;

b a

c b

d c

e

f e

g f

;

run;

 

****Expected Output*****

name depth last_child

a             3          d

e             2          g

 

I have tried to solve this but didn't get correct output.


Accepted Solutions
Solution
‎01-18-2018 12:12 PM
PROC Star
Posts: 226

Re: Finding depth and last child

I would try something like this:

proc sql;
  create index parent on test(parent);
quit;

data want;
  set test(where=(parent is null));
  parent=name;
  do depth=0 to 100; /* we stop at 100 in case there are circular references in data */
    set test(rename=(name=last_child)) key=parent;
    if _iorc_ then leave;
    parent=last_child;
    end;
  _error_=0; /* _ERROR_ is set when a key is not found, this is not an error here */
  keep name depth last_child;
run;

Using SET with KEY= is normally the easiest way to traverse trees. The value 100 is used as I assume that there are no trees with a depth greater than that - so if you find that value it may mean that you have a circular reference in your data.

View solution in original post


All Replies
Super User
Posts: 22,820

Re: Finding depth and last child

Can you show what you tried so we know what approach you're taking? 

There are several ways to solve this type of problem. 

 


amol123 wrote:

I have below dataset 'test' which has two columns name and parent:

 

Explanation=> a has no parent, a is parent of b, b is parent of c, c is parent of d, e has no parent, e is parent of f, f is parent of g,

so want to found the depth of 'a' and 'e':

i have given the output for your reference:

 

data test;

input name $ parent $;

datalines;

b a

c b

d c

e

f e

g f

;

run;

 

****Expected Output*****

name depth last_child

a             3          d

e             2          g

 

I have tried to solve this but didn't get correct output.


 

Solution
‎01-18-2018 12:12 PM
PROC Star
Posts: 226

Re: Finding depth and last child

I would try something like this:

proc sql;
  create index parent on test(parent);
quit;

data want;
  set test(where=(parent is null));
  parent=name;
  do depth=0 to 100; /* we stop at 100 in case there are circular references in data */
    set test(rename=(name=last_child)) key=parent;
    if _iorc_ then leave;
    parent=last_child;
    end;
  _error_=0; /* _ERROR_ is set when a key is not found, this is not an error here */
  keep name depth last_child;
run;

Using SET with KEY= is normally the easiest way to traverse trees. The value 100 is used as I assume that there are no trees with a depth greater than that - so if you find that value it may mean that you have a circular reference in your data.

New Contributor
Posts: 2

Re: Finding depth and last child

Thank you so much!!!!

 

PROC Star
Posts: 226

Re: Finding depth and last child

Yet another way of doing this - starting from the bottom. May be better if several nodes can have the same parent:

proc sql;
  create index name on test(name);
  create table want as select
    name as last_child,
    0 as depth,
    parent
  from test
  where name not in(select parent from test);
quit;

data want;
  modify want;
  do while(parent ne ' ');
    name=parent;
    set test key=name;
    depth=depth+1;
    end;
  parent=name;
  replace;
run;
Respected Advisor
Posts: 4,540

Re: Finding depth and last child

@amol123

Have you already searched the communities here. I know for sure that similar questions have been asked in the past (as I've been one of the people asking) and that solutions have been provided.

Super User
Posts: 10,610

Re: Finding depth and last child

[ Edited ]

Does parent could have multiple child , if it was ,then you could end with multiple last child .

 


data test;
input name $ parent $;
datalines;
a .
b a
c b
d c
e .
f e
g f
;
run;


data have;
 set test(rename=(parent=_start name=_end));
run;

proc sql;
create table parent as
 select *
  from have 
   where _start in (select _end from have where _start is missing);
quit;



data want(keep=path);
if _n_ eq 1 then do;
length path _path  $ 700 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;


set parent;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
*putlog 'WARNING:Found  ' _end;
   
pa.add();
do while(hi_path.next()=0);
 if n ne 1 then pa.remove(key:_n);_n=n;
 _path=path;   
 _start=scan(path,-1,'|');
 rc=ha.find(); if rc ne 0 then output;
 do while(rc=0);
  if not findw(path,strip(_end),'|') then do;
   if length(path)+length(_end)+1 gt lengthc(path) then do;
    putlog 'ERROR: The length of path and _path are set too short';
    stop;
   end;
   
  * putlog 'WARNING:Found  ' _end;
   count+1;n=count;
   path=catx('|',path,_end);
   pa.add(); 
   path=_path;
 end;
  rc=ha.find_next();
end;
end;
pa.clear();

run;


☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 165 views
  • 0 likes
  • 5 in conversation