Find all parents and children for each record. - Page 2

RVO · Posted 11-21-2024 09:04 AM

Correct. Note the sort_chain is 1-on-1 coupled to sort_name (each <space><space>b has sort_chain 1.2.1). This is as mentioned to use it in Viya as required.

I will show you where ours differ (and that is a very important difference):

The two highlighted records should be the same, but mine (correctly) shows 1.2.2, whereas your version shows 1.3.2, which is incorrect, since c is not a daughter of d (1.3.1).

I understand it is difficult to understand when it is not fully clear why an output is needed. But Viya has many peculiarities compared to coding a user interface where each visual can have SQL code behind it, especially if a company (like the one I work for) tries to misuse Viya for a CRM-system (that is the main problem ;))

Tom · Posted 11-21-2024 09:46 AM

So that is actually perhaps a little easier.

To start with build a string with CID values of all of the parents of the current node.

data want;
* Load the cid->parent_cid mapping into a hash ;
  if _n_=1 then do;
    declare hash h(dataset:'have');
    h.definekey('cid');
    h.definedata('parent_cid');
    h.definedone();
  end;
* Load next CID value ;
  set have ;
  length parents $20;
* Find their parents ;
  do depth=1 by 1 until(h.find(key:parent_cid));
    parents=catx('|',parent_cid,parents);
  end;
  keep cid parents depth ;
run;

Now you can generate the numbers you want.

proc sort;
  by parents cid;
run;

data want;
  set want;
  by parents ;
  order+1;
  if first.parents then order=1;
run;

And then again use a HASH to get the order numbers of the parents of the current node.

data numbered ;
  set want;
  if _n_=1 then do;
    declare hash h(dataset:'want');
    h.definekey('cid');
    h.definedata('order');
    h.definedone();
  end;
  array c[&md];
  c[depth]=order;
  do i=depth+1 to &md;
    c[i]=0;
  end;
  do i=1 to depth-1;
    h.find(key:scan(parents,i,'|'));
    c[i]=order;
  end;
  drop i;
  length sort_name $20 sort_index $30 ;
  sort_name=substr(repeat('A0'x,depth)||cid,3);
  sort_index=catx('.',of c[*]);
run;
proc sort;
  by sort_index;
run;

Result:

RVO · Posted 11-21-2024 10:09 AM

Hi Tom,

The final table really starts to look like it! You cannot understand how much I appreciate your help!

I figured out how to get it running so far, I will post it here for reference:

data have;
input cid $ parent_cid $;
cards;
a .
b a
c b
h c
d a
e .
f .
g f
run;


data want;
* Load the cid->parent_cid mapping into a hash ;
  if _n_=1 then do;
    declare hash h(dataset:'have');
    h.definekey('cid');
    h.definedata('parent_cid');
    h.definedone();
  end;
* Load next CID value ;
  set have ;
  length parents $20;
* Find their parents ;
  do depth=1 by 1 until(h.find(key:parent_cid));
    parents=catx('|',parent_cid,parents);
  end;
  keep cid parents depth ;
run;


proc sort;
  by parents cid;
run;

data want;
  set want;
  by parents ;
  order+1;
  if first.parents then order=1;
run;

* find how many levels will be needed ;
proc sql noprint;
  select max(depth) into :md trimmed from want;
quit;

data numbered ;
  set want;
  if _n_=1 then do;
    declare hash h(dataset:'want');
    h.definekey('cid');
    h.definedata('order');
    h.definedone();
  end;
  array c[&md];
  c[depth]=order;
  do i=depth+1 to &md;
    c[i]=0;
  end;
  do i=1 to depth-1;
    h.find(key:scan(parents,i,'|'));
    c[i]=order;
  end;
  drop i;
  length sort_name $20 sort_index $30 ;
  sort_name=substr(repeat('A0'x,depth)||cid,3);
  sort_index=catx('.',of c[*]);
run;

proc sort;
  by sort_index;
run;

(how do you get the formatting of the code with colors as per your post?)

Now the final part: how to get this for each cid duplicated (as per my original example)?
so cid a gets: a, b, c, h and d (in the version we are working on now)?

Tom · Posted 11-21-2024 12:26 PM

This forum will try to color code when you use the Insert SAS Code button (looks like the SAS running man icon) instead of the vanilla Insert Code icon.

If your data has cycles (A > C > B > A) then the DO loop will be an never end. One easy way to detect that is just put an upper limit on the depth.

  do depth=1 to 20 until(h.find(key:parent_cid));
    parents=catx('|',parent_cid,parents);
  end;
  if depth>20 then put 'ERROR: Possible cycle. ' cid= parents=;

I think perhaps you are asking how to add back in extra records for parents that have many children?

Try looking at just doing another DO loop. Probably with an OUTPUT statement in it to generate the extra observations. Or perhaps just use the max depth to determine how many levels of SQL rejoins to generate to add in the extra records.

RVO · Posted 11-21-2024 12:46 PM

Good tip on the cycles (I have it implemented in a similar way in another part of my code).

I have achieved my final table using the code below, but it is:

Hardcoded, any extra depth will not work.
Looks very inefficient.

Any ideas how to improve on this (both SQL and datasteps are ok solution, I just have some experience in the past with SQL hence why I used that)?

The below code block is 'insert SAS code', but still I get no formatting. I do get an HTML-error everytime I try to post the code though...

proc sql;
	/* parents of all cid's */
	CREATE TABLE parents1 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM numbered a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM numbered b
	LEFT JOIN numbered c ON b.copy_parent_cid=c.cid
	;

	
	CREATE TABLE parents2 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM parents1 a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM parents1 b
	LEFT JOIN numbered c ON b.copy_parent_cid=c.cid
	;

	CREATE TABLE parents3 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM parents2 a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM parents2 b
	LEFT JOIN numbered c ON b.copy_parent_cid=c.cid
	;



	/* children of all cid's */
	CREATE TABLE children1 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM numbered a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM numbered b
	LEFT JOIN numbered c ON b.cid=c.copy_parent_cid
	;

	
	CREATE TABLE children2 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM children1 a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM children1 b
	LEFT JOIN numbered c ON b.cid=c.copy_parent_cid
	;

	CREATE TABLE children3 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM children2 a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM children2 b
	LEFT JOIN numbered c ON b.cid=c.copy_parent_cid
	;


	CREATE TABLE TESTY AS
	SELECT DISTINCT cid, sort_name, sort_index
	FROM (
		SELECT *
		FROM parents3 a

		UNION ALL

		SELECT *
		FROM children3 b
		WHERE cid is not missing

	)
	ORDER BY cid, sort_index;

quit;

Which gives:

I removed 'h' child to compare 1-on-1 with my excel.

Full code for reference to create output:

%let sort_format=z1.;

data have;
	input cid $ parent_cid $;
	cards;
a .
b a
c b
d a
e .
f .
g f
run;

/*h c*/

data want;
	* Load the cid->parent_cid mapping into a hash;
	if _n_=1 then
		do;
			declare hash h(dataset:'have');
			h.definekey('cid');
			h.definedata('parent_cid');
			h.definedone();
		end;

	* Load next CID value;
	set have;

	if missing(parent_cid) then
		copy_parent_cid = cid;
	else copy_parent_cid = parent_cid;

	length parents $20;

	* Find their parents;
	do depth=1 by 1 until(h.find(key:parent_cid));
		parents=catx('|',parent_cid,parents);
	end;

	drop parent_cid;
run;

proc sort;
	by parents cid;
run;

data want;
	set want;
	by parents cid;
	order+1;

	if first.parents then
		order=1;
run;

* find how many levels will be needed;
proc sql noprint;
	select max(depth) into :md trimmed from want;
quit;

data numbered;
	set want;

	if _n_=1 then
		do;
			declare hash h(dataset:'want');
			h.definekey('cid');
			h.definedata('order');
			h.definedone();
		end;

	array L[&md] $10;
	L[depth]=put(order,&sort_format);

	do i=depth+1 to &md;
		L[i]=put(0,&sort_format);
	end;

	do i=1 to depth-1;
		h.find(key:scan(parents,i,'|'));
		L[i]=put(order,&sort_format);
	end;

	length sort_name $20 sort_index $30;
	sort_name=substr(repeat('A0'x,depth-1)||cid,2);
	sort_index=catx('.',of L[*]);
/*	drop i parents L: order;*/
run;

proc sort;
	by sort_index;
run;


proc sql;
	/* parents of all cid's */
	CREATE TABLE parents1 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM numbered a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM numbered b
	LEFT JOIN numbered c ON b.copy_parent_cid=c.cid
	;

	
	CREATE TABLE parents2 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM parents1 a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM parents1 b
	LEFT JOIN numbered c ON b.copy_parent_cid=c.cid
	;

	CREATE TABLE parents3 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM parents2 a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM parents2 b
	LEFT JOIN numbered c ON b.copy_parent_cid=c.cid
	;



	/* children of all cid's */
	CREATE TABLE children1 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM numbered a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM numbered b
	LEFT JOIN numbered c ON b.cid=c.copy_parent_cid
	;

	
	CREATE TABLE children2 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM children1 a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM children1 b
	LEFT JOIN numbered c ON b.cid=c.copy_parent_cid
	;

	CREATE TABLE children3 AS
	SELECT a.cid, a.copy_parent_cid, a.sort_name, a.sort_index
	FROM children2 a

	UNION ALL

	SELECT c.cid, c.copy_parent_cid, b.sort_name, b.sort_index
	FROM children2 b
	LEFT JOIN numbered c ON b.cid=c.copy_parent_cid
	;


	CREATE TABLE TESTY AS
	SELECT DISTINCT cid, sort_name, sort_index
	FROM (
		SELECT *
		FROM parents3 a

		UNION ALL

		SELECT *
		FROM children3 b
		WHERE cid is not missing

	)
	ORDER BY cid, sort_index;

quit;

Tom · Posted 11-21-2024 08:48 AM

@RVO wrote:
Yes, there will be additional observations (1000's of companies). And chains can be long: a company can have many children, grandchildren, grand-grandchildren, etc. (10+ levels deep).

So did you try your visualization in VIYA with an example where one of the values between the periods is larger than 9?

1.1.1
1.2.1
1.3.1
...
1.9.1
1.10.1
1.11.1

Did it work as you wanted it? Did it sort properly?

Tom · Posted 11-21-2024 08:42 AM

Converting CID and LEVEL to SORT_NAME is simple using the REPEAT() function.

Note that the number passed to REPEAT() is the number of EXTRA copies to produce so it always produces at least one copy. If you want the SORT_NAME to look indented like that when printed to ODS destinations then use non-breaking space ('A0'x) instead of a normal space.

sort_name=substr(repeat('A0'X,depth)||cid,3);

RVO · Posted 11-21-2024 09:12 AM

Hi Tom, in reply to both your questions:

I understand that the current example breaks when obs > 9. But in my final code (already use it for other parts in my EG script) will use z5. (up to 9999 unique parent companies/daughters/etc.) Can always upgrade to z6., z7, etc.

An example where I already use it in another part of my script:

 sort_id = cats(sort_id, put(new_id, z3.));

Example result in Viya for that code:

FYI: Viya does not sort characters after the 5th character (in my testing), so I actually replace the sort_id buy a sort_number in code and use that in Viya:

data &table_out(drop=sort_id);
	set &table_out;
	by sort_id;

	if first.sort_id then
		hierarchy_sort_number+1;
run;

By using first. all sort_id's that are identical get the same sort_number.

Regarding repeat, using this for the example in this reply:

hierarchy_sort_name = substr(cat(repeat("  ", %eval(&i-1)), hierarchy_level_&i), 3);

The tip to use 'A0'x is much appreciated. I currently replace the spaces in Viya in a calculated column with this unicode character: https://www.compart.com/en/unicode/U+2800 (braille blank pattern), but I like your suggestion better. When I just tried to copy the unicode charater in EG, it gave me an error and refused to run the script.

The code this is part of actually creates the same effect, but there the hierarchy data is in separate columns (hierarchy_level1, hierarchy_level2, etc). This is just the way the source data is formatted. For some reason that was easy for me to create. But to achieve the same with only parent-child data, recursive coding (CTE?) is required, and I am unfamiliar with it.

Re: Find all parents and children for each record.

Re: Find all parents and children for each record.

Re: Find all parents and children for each record.

Re: Find all parents and children for each record.

Re: Find all parents and children for each record.

Re: Find all parents and children for each record.

Re: Find all parents and children for each record.

Re: Find all parents and children for each record.

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away