How to find great grand parents in the dataset

_el_doredo · Posted 03-15-2021 01:41 PM

Hi Experts,

I need to find out great grandparent for the child using this dataset.

DATA hierarchy;

input parent $ child $;

cards;

A B

B K

K C

;

RUN;

In this dataset, Value C's parent is K and C's grandparent is B and C's great grandparent A. So i need result like this

I tried with PROC SQL. I am able to find out till Grandparent but i am not able to find out Great_grandparent hierarchy.

I used this code

proc sql;
select child,parent,
(select parent from hierarchy where child=a.parent) as grandparent
from hierarchy as a
where child='C';
quit;

but i am not able to find great grandparent...

Can you guys please suggest a way to achieve this use proc sql itself

Thanks in advance

Regards,

Syed

Reeza · Posted 03-15-2021 01:45 PM

Have you tried repeating that one more time? Separate it into two separate steps.

If you have SAS/OR you can also look at PROC BOM or there's a subgraph macro. Your problem is analogous to a bill of materials problem where one item is composed of others or the graph problem of looking for a path or node in a graph.

_el_doredo · Posted 03-15-2021 01:49 PM

Hi Reeza,

So far i didn't used PROC BOM or subgraph... If you send a code using PROC BOM or subgraph means it will be useful for my learning.

I tried with repeating the code one more time but not getting expected results..

Thanks,
Syed

Reeza · Posted 03-15-2021 03:37 PM

Please show what you've tried. You really are just repeating the exact same logic one more time, replacing the parents with the grandparents to find the great grand parents.

_el_doredo · Posted 03-17-2021 11:28 AM

Hi Reez,

I tried this way and it worked.I am looking for any better code to use with all type of hierarchy scenarios

data hierarchy;
input parent $ child $ ;
cards;
A B
B K
K C
;
run;

proc sql;
create table master as
select A.parent from hierarchy as A
union all
select A.child from hierarchy as A
where A.child not in (select B.parent from hierarchy as B)
;
quit;

data dummy(drop=counter);
set master;
counter=_n_;
if counter=1 then relationship='great grand parent';
else if counter=2 then relationship='grand parent';
else if counter=3 then relationship='parent';
else if counter=4 then relationship='child';
proc print noobs;
run;

proc transpose data=dummy out=final(drop=_name_);
id relationship;
var parent;
run;

JonathanWill · Posted 03-15-2021 02:19 PM

You could create a format to do the child->parent lookup and then use this to create the grandparent/greatgrandparent in a datastep

data fmt_hier / view=fmt_hier;
  retain 
    fmtName 'parent'
    type 'c'
  ;
  
  set hierarchy(rename=(child=start parent=label)) end=eof;
  output;
  
  if eof then do;
    hlo='O';
    label='?';
    output;
  end;
run;
proc format cntlin=fmt_hier;
run;

data parents;
  set hierarchy;
  
  grandparent = put(parent, $parent.);
  greatgrandparent = put(grandparent, $parent.);
run;

proc print data=parents;
  id child;
run;

Result:

PeterClemmensen · Posted 03-15-2021 02:26 PM

A hash approach

DATA hierarchy;
input parent $ child $;
cards;
A B
B K
K C
;

data want(drop = rc);

   format child parent grandparent greatgrandparent;

   if _N_ = 1 then do;
      dcl hash h(dataset : "hierarchy");
      h.definekey("child");
      h.definedata("parent");
      h.definedone();
   end;
   
   set hierarchy;

   rc = h.find(key : parent);

   if rc = 0 then do;
      grandparent = parent;
      rc = h.find(key : grandparent);
   end;

   if rc = 0 then greatgrandparent = parent;

   rc = h.find();

run;

Result:

child parent grandparent greatgrandparent 
B     A          
K     B      A   
C     K      B           A

The DATA to DATA Step Macro
Blog: SASnrd

_el_doredo · Posted 03-17-2021 11:41 AM

Hi Clemmensen,
Thanks for your solution... It working fine as expected.. I am not familiar with hash approach.. So thanks for introducing one new concept to me...

_el_doredo · Posted 03-17-2021 11:40 AM

Hi William,
Thanks for you solution.. It is working perfectly fine. But, I have few doubts..
1) What's the purpose of using type 'C'. Is it used to mention fmtname is character variable?. I tried with using Dollar symbol($) and _CHARACTER_ but it throughing some error... It will be helpful if i got to know the purpose

2)Why we are using hlo='0'. I know it has to do with formatting.. Because it throughing error if we removed that line. So i am not sure about the purpose of this..

JonathanWill · Posted 03-18-2021 05:59 AM

@_el_doredo

1) Correct. The type = 'C' identifies it as a character format. 'N' would be numeric format, 'I' numeric informat.

2) The if eof block that contains hlo='O' creates an extra entry in the format that specifies the OTHER value. So this will be used for any input that is not in the known values. If you do not have this in the format any unknown input value will simply give the input value back.

i.e. Using the format above put("Z", $parent.) would give "?". Without the OTHER entry it would give "Z".

If you didn't want the OTHER entry then you should remove the whole of the block of code 'if eof then...' to the 'end;'. If you removed only the hlo='O' line then you will get a duplicate 'start' value.

mkeintz · Posted 03-15-2021 02:27 PM

This is a ready-made task for using hash objects in a data step:

DATA hierarchy;
  input child $ parent $ ;
cards;
B A
K B
C K
RUN;
data want (drop=_:);
  set hierarchy;
  if _n_=1 then do;
    if 0 then set hierarchy (rename=(child=_c parent=_p));
    declare hash h (dataset:'hierarchy (rename=(child=_c parent=_p))');
      h.definekey('_C');
      h.definedata('_P');
      h.definedone();
      h.output(dataset:'h');
  end;
  _rc=h.find(key:parent);
  if _rc=0 then do;
    grandparent=_P;
    _rc=h.find(key:grandparent);
    if _rc=0 then great_grandparent=_P;
  end;
  if great_grandparent^=' ';
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

_el_doredo · Posted 03-17-2021 11:49 AM

Hi Mkeintz,
Thanks for your solution... It working fine as expected.. I am not familiar with hash approach.. So thanks for introducing one new concept to me...

mkeintz · Posted 03-18-2021 12:31 PM

I said in a separate response that I preferred the hash approach to the sql approach, primarily because the hash approach is easier to expand to more generations. Here's a hash usage that goes to 6 generations, with only the most minor modifications to the program. (I have added 2 generations to your sample):

DATA hierarchy;
  input child $ parent $ ;
cards;
B A
K B
C K
P C
O P
RUN;

data want (drop=_:);
  set hierarchy;
  if _n_=1 then do;
    if 0 then set hierarchy (rename=(child=_c parent=_p));
    declare hash h (dataset:'hierarchy (rename=(child=_c parent=_p))');
      h.definekey('_C');
      h.definedata('_P');
      h.definedone();
      h.output(dataset:'h');
  end;
  array lineage {*} $1 child parent gpar g_gpar gg_gpar ggg_gpar;
  do _i=2 to dim(lineage)-1 while (h.find(key:lineage{_i})=0);
    lineage{_i+1}=_p;
  end;
  if lineage{dim(lineage)}^=' ';
run;

To modify this to any number of generations, all you have to do is to modify the number of variables named in the LINEAGE array. Compare that to what would be required in an SQL approach.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 03-16-2021 08:00 AM

DATA hierarchy;
input parent $ child $;
cards;
A B
B K
K C
;

data have;
set hierarchy;
rename parent=_start child=_end;
run;

 
proc sql;
create table ancient as
 select _start,_end from have
  where _start not in (select distinct _end from have);
run;
data want(keep=path);
if _n_ eq 1 then do;
length path _path  $ 400 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;


set ancient;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
   
pa.add();
do while(hi_path.next()=0);
 if n ne 1 then pa.remove(key:_n);_n=n;
 _path=path;  
 _start=scan(path,-1,'|');
 rc=ha.find();if rc ne 0 then output;
 do while(rc=0);
  if not findw(path,strip(_end),'|') then do;
   if length(path)+length(_end)+1 gt lengthc(path) then do;
    putlog 'ERROR: The length of path and _path are set too short';
    stop;
   end;
   
   count+1;n=count;
   path=catx('|',path,_end);
   pa.add();
   path=_path;
 end; else output;
  rc=ha.find_next();
end;
end;
pa.clear();
run;

_el_doredo · Posted 03-17-2021 11:50 AM

Hi Ksharp,
Thanks for the solution.Let me try this from my end

How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Re: How to find great grand parents in the dataset

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away