Hello,
I have 2 variables in a dataset viz child code and parent code. I want to filter for all the childrens within a parent code.
Eg:
Child code | Mother code |
2 | 1 |
3 | 2 |
4 | 2 |
5 | 2 |
6 | 3 |
7 | 6 |
8 | 7 |
From above sample dataset, If i select 2 I want all the childrens and grand childrens under 2 ie 3, 4, 5, 6, 7, 8 and if i select 3 as a parent code then 6, 7, 8 etc..
Kindly help..
Thanks,
Ravi
Here is a simple solution using an index.
data have(index=(child / unique));
input Child Mother;
datalines;
2 1
3 2
4 2
5 2
6 3
7 6
8 7
;
data want;
set have;
ancestor = mother;
descendent = child;
output;
do i = 1 to 100 until(_error_); /* catch infinite loops */
child = mother;
set have key=child / unique;
if not _error_ then do;
ancestor = mother;
output;
end;
end;
_error_ = 0;
keep ancestor descendent;
run;
proc sort data=want; by ancestor descendent; run;
proc print; run;
How are you specifying the 2/3? Can you specify multiples ie 2 & 3?
What do you want the output to look like?
Thanks for your reply Reeza.
Yes, there should be an ability to specify 2 and 3.
Output will look like:
Mother code | Child code |
2 | 3 |
2 | 4 |
2 | 5 |
2 | 6 |
2 | 7 |
3 | 6 |
3 | 7 |
3 | 8 |
7 | 8 |
Hope this helps.
Thanks,
Ravi
Is there a limit to the number of generations between mother and descendant that you want?
Thanks for your reply PG Stats.
There is no set limit as such for the generation of rows. The generations will be until the last child code is available in a relationship.
Thanks,
Ravi Megharaj
This is a traverse the graph problem.
One way to accomplish this is to identify all clusters and then you can filter what you need from final table.
This code is @PGStats solution for this type of problem. If it's not, he'll correct me
https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30
Unfortunately, the subGraph macro cannot solve this problem because it is meant for undirected graphs. But as we all know the parent-child relationship is directed . The subgraph macro would thus link a person to all members of his/her family.
Here is a simple solution using an index.
data have(index=(child / unique));
input Child Mother;
datalines;
2 1
3 2
4 2
5 2
6 3
7 6
8 7
;
data want;
set have;
ancestor = mother;
descendent = child;
output;
do i = 1 to 100 until(_error_); /* catch infinite loops */
child = mother;
set have key=child / unique;
if not _error_ then do;
ancestor = mother;
output;
end;
end;
_error_ = 0;
keep ancestor descendent;
run;
proc sort data=want; by ancestor descendent; run;
proc print; run;
PG, What if there is dead loop ? Like: 2 1 3 2 1 3
Looping will stop when i reaches 100. A check could (should) be added for this condition
if i > 99 then ERROR "Invalid data involving descendant " descendent;
else _ERROR_ = 0;
(untested)
PG, What if there are actually more than 100 strat/subject , i.e. 120 different subject ?
The loop travels the tree from child to its most ancient ancestor. So you would need data about more than 100 generations (not subjects) to reach the limit. That limit could be increased of course, but 100 seemed reasonable to me.
Thanks a lot PGStats. Your solution helps and resolves my issue.
I tried to implement FIFO (first in first out stack) concept to find all the child. User need to provide mother id in macro variable
search_val
The final desired result would come in work.output_list.
%let search_val=2;
data input;
input C_cd M_cd;
datalines;
2 1
3 2
4 2
5 2
6 3
7 6
8 7
;
proc sql;
create table parent_child as
select a.c_cd as ID,
a.M_cd as mother, b.C_cd as child,
put(a.M_cd,8.) || ", "||put(a.c_cd,8.)|| ", "||put(b.C_cd ,8.)as list
from input a left join input b
on a.C_cd = b.M_cd;
quit;
%let cnt_val =1;
proc sql;
drop table fifo_list ;
drop table output_list ;
drop table new_list;
quit;
data fifo_list;
child=&search_val;
run;
%macro fifo;
%start: %put &cnt_val ;
data _null_;
set fifo_list (obs=1);
call symputx('val',child);
run;
proc sql;
delete from fifo_list where child=&val;
create table new_list as
select child from parent_child
where ID=&val and child <> . ;
quit;
proc append base=fifo_list data=new_list;run;
proc append base=output_list data=new_list;run;
proc sql;
select count(*) into :cnt_val
from fifo_list where child <> . ; ;
quit;
%if &cnt_val > 0 %then %goto start;
%mend;
%fifo;
Here is mine. Check LOG after running code. data have; input _end _start ; cards; 2 1 3 2 4 2 5 2 6 3 7 6 8 7 ; run; %let start=2; data want(keep=path); if _n_ eq 1 then do; length path _path $ 700 ; if 0 then set have; declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y'); ha.definekey('_start'); ha.definedata('_end'); ha.definedone(); declare hash pa(ordered:'y'); declare hiter hi_path('pa'); pa.definekey('n'); pa.definedata('n','path'); pa.definedone(); end; set have(where=(_start=&start)); count=1;n=1;_n=1; path=catx('|',_start,_end); putlog 'WARNING:Found ' _end; pa.add();output; do while(hi_path.next()=0); if n ne 1 then pa.remove(key:_n);_n=n; _path=path; _start=scan(path,-1,'|'); rc=ha.find(); do while(rc=0); if not findw(path,strip(_end),'|') then do; if length(path)+length(_end)+1 gt lengthc(path) then do; putlog 'ERROR: The length of path and _path are set too short'; stop; end; putlog 'WARNING:Found ' _end; count+1;n=count; path=catx('|',path,_end); pa.add(); output; path=_path; end; rc=ha.find_next(); end; end; pa.clear(); run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.