Hi Experts,
I need to find out great grandparent for the child using this dataset.
DATA hierarchy;
input parent $ child $;
cards;
A B
B K
K C
;
RUN;
In this dataset, Value C's parent is K and C's grandparent is B and C's great grandparent A. So i need result like this
I tried with PROC SQL. I am able to find out till Grandparent but i am not able to find out Great_grandparent hierarchy.
I used this code
proc sql;
select child,parent,
(select parent from hierarchy where child=a.parent) as grandparent
from hierarchy as a
where child='C';
quit;
but i am not able to find great grandparent...
Can you guys please suggest a way to achieve this use proc sql itself
Thanks in advance
Regards,
Syed
Hi Reeza,
So far i didn't used PROC BOM or subgraph... If you send a code using PROC BOM or subgraph means it will be useful for my learning.
I tried with repeating the code one more time but not getting expected results..
Thanks,
Syed
Hi Reez,
I tried this way and it worked.I am looking for any better code to use with all type of hierarchy scenarios
data hierarchy;
input parent $ child $ ;
cards;
A B
B K
K C
;
run;
proc sql;
create table master as
select A.parent from hierarchy as A
union all
select A.child from hierarchy as A
where A.child not in (select B.parent from hierarchy as B)
;
quit;
data dummy(drop=counter);
set master;
counter=_n_;
if counter=1 then relationship='great grand parent';
else if counter=2 then relationship='grand parent';
else if counter=3 then relationship='parent';
else if counter=4 then relationship='child';
proc print noobs;
run;
proc transpose data=dummy out=final(drop=_name_);
id relationship;
var parent;
run;
You could create a format to do the child->parent lookup and then use this to create the grandparent/greatgrandparent in a datastep
data fmt_hier / view=fmt_hier;
retain
fmtName 'parent'
type 'c'
;
set hierarchy(rename=(child=start parent=label)) end=eof;
output;
if eof then do;
hlo='O';
label='?';
output;
end;
run;
proc format cntlin=fmt_hier;
run;
data parents;
set hierarchy;
grandparent = put(parent, $parent.);
greatgrandparent = put(grandparent, $parent.);
run;
proc print data=parents;
id child;
run;
Result:
A hash approach
DATA hierarchy;
input parent $ child $;
cards;
A B
B K
K C
;
data want(drop = rc);
format child parent grandparent greatgrandparent;
if _N_ = 1 then do;
dcl hash h(dataset : "hierarchy");
h.definekey("child");
h.definedata("parent");
h.definedone();
end;
set hierarchy;
rc = h.find(key : parent);
if rc = 0 then do;
grandparent = parent;
rc = h.find(key : grandparent);
end;
if rc = 0 then greatgrandparent = parent;
rc = h.find();
run;
Result:
child parent grandparent greatgrandparent B A K B A C K B A
1) Correct. The type = 'C' identifies it as a character format. 'N' would be numeric format, 'I' numeric informat.
2) The if eof block that contains hlo='O' creates an extra entry in the format that specifies the OTHER value. So this will be used for any input that is not in the known values. If you do not have this in the format any unknown input value will simply give the input value back.
i.e. Using the format above put("Z", $parent.)
would give "?". Without the OTHER entry it would give "Z".
If you didn't want the OTHER entry then you should remove the whole of the block of code 'if eof then...' to the 'end;'. If you removed only the hlo='O' line then you will get a duplicate 'start' value.
This is a ready-made task for using hash objects in a data step:
DATA hierarchy;
input child $ parent $ ;
cards;
B A
K B
C K
RUN;
data want (drop=_:);
set hierarchy;
if _n_=1 then do;
if 0 then set hierarchy (rename=(child=_c parent=_p));
declare hash h (dataset:'hierarchy (rename=(child=_c parent=_p))');
h.definekey('_C');
h.definedata('_P');
h.definedone();
h.output(dataset:'h');
end;
_rc=h.find(key:parent);
if _rc=0 then do;
grandparent=_P;
_rc=h.find(key:grandparent);
if _rc=0 then great_grandparent=_P;
end;
if great_grandparent^=' ';
run;
I said in a separate response that I preferred the hash approach to the sql approach, primarily because the hash approach is easier to expand to more generations. Here's a hash usage that goes to 6 generations, with only the most minor modifications to the program. (I have added 2 generations to your sample):
DATA hierarchy;
input child $ parent $ ;
cards;
B A
K B
C K
P C
O P
RUN;
data want (drop=_:);
set hierarchy;
if _n_=1 then do;
if 0 then set hierarchy (rename=(child=_c parent=_p));
declare hash h (dataset:'hierarchy (rename=(child=_c parent=_p))');
h.definekey('_C');
h.definedata('_P');
h.definedone();
h.output(dataset:'h');
end;
array lineage {*} $1 child parent gpar g_gpar gg_gpar ggg_gpar;
do _i=2 to dim(lineage)-1 while (h.find(key:lineage{_i})=0);
lineage{_i+1}=_p;
end;
if lineage{dim(lineage)}^=' ';
run;
To modify this to any number of generations, all you have to do is to modify the number of variables named in the LINEAGE array. Compare that to what would be required in an SQL approach.
DATA hierarchy; input parent $ child $; cards; A B B K K C ; data have; set hierarchy; rename parent=_start child=_end; run; proc sql; create table ancient as select _start,_end from have where _start not in (select distinct _end from have); run; data want(keep=path); if _n_ eq 1 then do; length path _path $ 400 ; if 0 then set have; declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y'); ha.definekey('_start'); ha.definedata('_end'); ha.definedone(); declare hash pa(ordered:'y'); declare hiter hi_path('pa'); pa.definekey('n'); pa.definedata('n','path'); pa.definedone(); end; set ancient; count=1;n=1;_n=1; path=catx('|',_start,_end); pa.add(); do while(hi_path.next()=0); if n ne 1 then pa.remove(key:_n);_n=n; _path=path; _start=scan(path,-1,'|'); rc=ha.find();if rc ne 0 then output; do while(rc=0); if not findw(path,strip(_end),'|') then do; if length(path)+length(_end)+1 gt lengthc(path) then do; putlog 'ERROR: The length of path and _path are set too short'; stop; end; count+1;n=count; path=catx('|',path,_end); pa.add(); path=_path; end; else output; rc=ha.find_next(); end; end; pa.clear(); run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.