BookmarkSubscribeRSS Feed
_el_doredo
Quartz | Level 8

Hi Experts,

I need to find out great grandparent for the child using this dataset.

DATA hierarchy;

input parent $ child $;

cards;

A B

B K

K C

;

RUN;

In this dataset, Value C's parent is K and C's grandparent is B and C's great grandparent A. So i need result like this

_el_doredo_0-1615830145745.png


I tried with PROC SQL. I am able to find out till Grandparent but i am not able to find out Great_grandparent hierarchy.


I used this code

proc sql;
select child,parent,
(select parent from hierarchy where child=a.parent) as grandparent
from hierarchy as a
where child='C';
quit;

 

but i am not able to find great grandparent...

 

Can you guys please suggest a way to achieve this use proc sql itself

 

Thanks in advance

 

Regards,

Syed

15 REPLIES 15
Reeza
Super User
Have you tried repeating that one more time? Separate it into two separate steps.

If you have SAS/OR you can also look at PROC BOM or there's a subgraph macro. Your problem is analogous to a bill of materials problem where one item is composed of others or the graph problem of looking for a path or node in a graph.
_el_doredo
Quartz | Level 8

Hi Reeza,

So far i didn't used PROC BOM or subgraph... If you send a code using PROC BOM or subgraph means it will be useful for my learning.

 

I tried with repeating the code one more time but not getting expected results..

 

Thanks,
Syed

Reeza
Super User
Please show what you've tried. You really are just repeating the exact same logic one more time, replacing the parents with the grandparents to find the great grand parents.
_el_doredo
Quartz | Level 8

Hi Reez,

I tried this way and it worked.I am looking for any better code to use with all type of hierarchy scenarios

 

data hierarchy;
input parent $ child $ ;
cards;
A B
B K
K C
;
run;

proc sql;
create table master as
select A.parent from hierarchy as A
union all
select A.child from hierarchy as A
where A.child not in (select B.parent from hierarchy as B)
;
quit;

data dummy(drop=counter);
set master;
counter=_n_;
if counter=1 then relationship='great grand parent';
else if counter=2 then relationship='grand parent';
else if counter=3 then relationship='parent';
else if counter=4 then relationship='child';
proc print noobs;
run;

proc transpose data=dummy out=final(drop=_name_);
id relationship;
var parent;
run;

JonathanWill
Obsidian | Level 7

You could create a format to do the child->parent lookup and then use this to create the grandparent/greatgrandparent in a datastep

 

data fmt_hier / view=fmt_hier;
  retain 
    fmtName 'parent'
    type 'c'
  ;
  
  set hierarchy(rename=(child=start parent=label)) end=eof;
  output;
  
  if eof then do;
    hlo='O';
    label='?';
    output;
  end;
run;
proc format cntlin=fmt_hier;
run;

data parents;
  set hierarchy;
  
  grandparent = put(parent, $parent.);
  greatgrandparent = put(grandparent, $parent.);
run;

proc print data=parents;
  id child;
run;

Result:

JonathanWilliamson_1-1615832560105.png

 

 

PeterClemmensen
Tourmaline | Level 20

A hash approach

 

DATA hierarchy;
input parent $ child $;
cards;
A B
B K
K C
;

data want(drop = rc);

   format child parent grandparent greatgrandparent;

   if _N_ = 1 then do;
      dcl hash h(dataset : "hierarchy");
      h.definekey("child");
      h.definedata("parent");
      h.definedone();
   end;
   
   set hierarchy;

   rc = h.find(key : parent);

   if rc = 0 then do;
      grandparent = parent;
      rc = h.find(key : grandparent);
   end;

   if rc = 0 then greatgrandparent = parent;

   rc = h.find();

run;

 

Result:

 

child parent grandparent greatgrandparent 
B     A          
K     B      A   
C     K      B           A 
_el_doredo
Quartz | Level 8
Hi Clemmensen,
Thanks for your solution... It working fine as expected.. I am not familiar with hash approach.. So thanks for introducing one new concept to me...
_el_doredo
Quartz | Level 8
Hi William,
Thanks for you solution.. It is working perfectly fine. But, I have few doubts..
1) What's the purpose of using type 'C'. Is it used to mention fmtname is character variable?. I tried with using Dollar symbol($) and _CHARACTER_ but it throughing some error... It will be helpful if i got to know the purpose

2)Why we are using hlo='0'. I know it has to do with formatting.. Because it throughing error if we removed that line. So i am not sure about the purpose of this..
JonathanWill
Obsidian | Level 7

@_el_doredo 

1) Correct.  The type = 'C' identifies it as a character format.   'N' would be numeric format, 'I' numeric informat.   

 

2) The if eof block that contains hlo='O' creates an extra entry in the format that specifies the OTHER value.  So this will be used for any input that is not in the known values.   If you do not have this in the format any unknown input value will simply give the input value back. 

i.e. Using the format above put("Z", $parent.) would give "?". Without the OTHER entry it would give "Z".

If you didn't want the OTHER entry then you should remove the whole of the block of code 'if eof then...' to the 'end;'.  If you removed only the hlo='O' line then you will get a duplicate 'start' value.  

mkeintz
PROC Star

This is a ready-made task for using hash objects in a data step:

 

DATA hierarchy;
  input child $ parent $ ;
cards;
B A
K B
C K
RUN;
data want (drop=_:);
  set hierarchy;
  if _n_=1 then do;
    if 0 then set hierarchy (rename=(child=_c parent=_p));
    declare hash h (dataset:'hierarchy (rename=(child=_c parent=_p))');
      h.definekey('_C');
      h.definedata('_P');
      h.definedone();
      h.output(dataset:'h');
  end;
  _rc=h.find(key:parent);
  if _rc=0 then do;
    grandparent=_P;
    _rc=h.find(key:grandparent);
    if _rc=0 then great_grandparent=_P;
  end;
  if great_grandparent^=' ';
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
_el_doredo
Quartz | Level 8
Hi Mkeintz,
Thanks for your solution... It working fine as expected.. I am not familiar with hash approach.. So thanks for introducing one new concept to me...
mkeintz
PROC Star

I said in a separate response that I preferred the hash approach to the sql approach, primarily because the hash approach is easier to expand to more generations.  Here's a hash usage that goes to 6 generations, with only the most minor modifications to the program.  (I have added 2 generations to your sample):

 

DATA hierarchy;
  input child $ parent $ ;
cards;
B A
K B
C K
P C
O P
RUN;

data want (drop=_:);
  set hierarchy;
  if _n_=1 then do;
    if 0 then set hierarchy (rename=(child=_c parent=_p));
    declare hash h (dataset:'hierarchy (rename=(child=_c parent=_p))');
      h.definekey('_C');
      h.definedata('_P');
      h.definedone();
      h.output(dataset:'h');
  end;
  array lineage {*} $1 child parent gpar g_gpar gg_gpar ggg_gpar;
  do _i=2 to dim(lineage)-1 while (h.find(key:lineage{_i})=0);
    lineage{_i+1}=_p;
  end;
  if lineage{dim(lineage)}^=' ';
run;

To modify this to any number of generations, all you have to do is to modify the number of variables named in the LINEAGE array.  Compare that to what would be required in an SQL approach.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
DATA hierarchy;
input parent $ child $;
cards;
A B
B K
K C
;

data have;
set hierarchy;
rename parent=_start child=_end;
run;

 
proc sql;
create table ancient as
 select _start,_end from have
  where _start not in (select distinct _end from have);
run;
data want(keep=path);
if _n_ eq 1 then do;
length path _path  $ 400 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;


set ancient;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
   
pa.add();
do while(hi_path.next()=0);
 if n ne 1 then pa.remove(key:_n);_n=n;
 _path=path;  
 _start=scan(path,-1,'|');
 rc=ha.find();if rc ne 0 then output;
 do while(rc=0);
  if not findw(path,strip(_end),'|') then do;
   if length(path)+length(_end)+1 gt lengthc(path) then do;
    putlog 'ERROR: The length of path and _path are set too short';
    stop;
   end;
   
   count+1;n=count;
   path=catx('|',path,_end);
   pa.add();
   path=_path;
 end; else output;
  rc=ha.find_next();
end;
end;
pa.clear();
run;
_el_doredo
Quartz | Level 8
Hi Ksharp,
Thanks for the solution.Let me try this from my end

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1588 views
  • 8 likes
  • 6 in conversation