BookmarkSubscribeRSS Feed
RVO
Fluorite | Level 6 RVO
Fluorite | Level 6

I'm stuck trying to go from this table:

cidparent_cid
a 
ba
cb
da
e 
f 
gf

 

To this table:

cidsort_cidsort_chainlevel
aa1.1.11
ab1.2.12
ac1.2.23
ad1.3.12
ba1.1.11
bb1.2.12
bc1.2.23
ca1.1.11
cb1.2.12
cc1.2.23
da1.1.11
dd1.3.12
ee2.1.11
ff3.1.11
fg3.2.12
gf3.1.11
gg3.2.12

 

In short, I want for every cid every parent and child cid in the 'chain' as sort_cid. Next to that, for every sort_cid I need a dot-separated chain of numbers, to be able to sort the sort_cid in a Viya Dashboard. I also need to know which 'level' each sort_cid is, e.g. if it is the main parent (level 1) or it has two parents (level 3).

Any help would be much appreciated!

22 REPLIES 22
PeterClemmensen
Tourmaline | Level 20

Why is level = 2 in the 4th obs?

RVO
Fluorite | Level 6 RVO
Fluorite | Level 6

Because a is the parent of d. So d is the child of a. Since a = level 1, then d = level 2. Another way to see it: b and d are siblings :). This can be deducted from level = 2.

What I want to achieve:
I want to use the level-column to put spaces in front of the cid (which in my real data is a company name), together with the sort_chain to sort the data in viya. My final result in Viya should look like this:

a

  b

     c

  d

e

f

  g

 

In Viya I want to select a company (using cid), and then want to display the full company-chain to the user.

So if a user selects 'a', they should see this in Viya in a list table:

a

  b

     c

  d

 

The sort_chain is used to make sure the results are always displayed in this order.

Tom
Super User Tom
Super User

I do not get your output.  But here is a simple way you can load the parent/child pairs into a hash object and use that to generate a list of each person and which family and what level within they family they are.  

data have ;
  input cid $	parent_cid $;
cards;
a	.
b	a
c	b
d	a
e	.
f	.
g	f
;

data want;
  set have ;
  where missing(parent_cid);
  family+1;
if _n_=1 then do;
  declare hash h(multidata:'yes',dataset:'have(where=(not missing(parent_cid)))');
  h.definekey('parent_cid');
  h.definedata('cid');
  h.definedone();
end;
 do generation=1 by 1 until(h.find());
    output;
    parent_cid=cid;
 end;
run;

Result

              parent_
Obs    cid      cid      family    generation

 1      a                   1           1
 2      b        a          1           2
 3      c        b          1           3
 4      e                   2           1
 5      f                   3           1
 6      g        f          3           2

RVO
Fluorite | Level 6 RVO
Fluorite | Level 6

Hi Tom,

 

Thank you for taking the time to help me out!

I have replied above to another comment to further explain my output (and why it is needed).

I have no experience with hash-objects, so I do not fully understand the workings of the code.

The output you have generates is a step in the right direction, but record [d a] is missing. I do not fully understand from the code why that would happen (unless a hash in SAS is the equivalent of a dictionary in Python? Then one cannot have duplicate keys).

 

Hope this together with my other comment clarifies it a bit. I understand the output might be 'strange', but Viya is a strange program to work with, so sometimes unconventional tables must be loaded into it to create the output a user wants 😉

quickbluefish
Barite | Level 11

I think my brain melted trying to understand the 'sort_chain' variable, but the following code produces the rest of them in the format you specified - I think I'm just not understanding how to derive sort_chain from this.


data have;
infile cards dsd truncover firstobs=1 dlm=',';
length cid parent_cid $1;
input cid parent_cid;
cards;
a,
b,a
c,b
d,a
e,
f,
g,f
;
run;

proc sql noprint;
create table c2num as
select 'fc2num' as fmtname length=8, 
'C' as type length=1, cid as start, monotonic() as label
from have;
quit;

data num2c (rename=(_start=start _label=label));
set c2num (rename=(start=_label label=_start));
fmtname='fnum2c';
type='N';
run;

data c2num;
set c2num end=last;
output;
n+1;
if last then do;
	start=' ';
	label=0;
	output;
	call symputx("ncids",n);
end;
drop n;
run;

proc format cntlin=c2num; run;
proc format cntlin=num2c; run;

data want;
set have (rename=(cid=_CID parent_CID=_PC)) end=last;
array nparents {0:&ncids} _temporary_ (%eval(&ncids+1)*0);
array p {&ncids} _temporary_;
nparents[put(_CID,$fc2num.)*1]+nparents[put(_PC,$fc2num.)*1]+1;
p[put(_CID,$fc2num.)*1]=put(_PC,$fc2num.)*1;
if last then do;
	do i=1 to dim(p);
		cid=put(i,fnum2c.);
		sort_cid=cid;
		level=nparents[i];
		output;
		do j=1 to dim(p);
			if p[j]=i then do;
				sort_cid=put(j,fnum2c.);
				level=nparents[j];
				output;
				do k=1 to dim(p);
					if p[k]=j then do;
						sort_cid=put(k,fnum2c.);
						level=nparents[k];
						output;
					end;
				end;
			end;
		end;
		level=nparents[i];
		_i=i;
		do while (level>1);
			sort_cid=put(p[_i],fnum2c.);
			level=nparents[p[_i]];
			output;
			_i=p[_i];
		end;
	end;
end;
keep cid sort_cid level;
run;

proc sort data=want; by cid sort_cid; run;
proc print data=want; run;

quickbluefish_0-1732127253825.png

 

 

 

 

RVO
Fluorite | Level 6 RVO
Fluorite | Level 6
Hi quickbluefish,

Thank you as well for trying to help me out! Your output table looks (except for the sort_chain) like my output table! I have in another reply tried to better explain the sort_chain (how it works, a different variant that might feel more 'logical', and why it is needed).

My main problem is: I do not understand how your code works (I've only just started to program in SAS EG).
On a high level I understand that you loop over parent>child (c2num?) and child>parent (num2c?) tables, to get the parents of every child, and the children of every parent. But I do not understand most parts of the code what it does.
Tom
Super User Tom
Super User

So you just need to calculate the depth for each CID and which CID is at the top of their tree.  Then you can use BY group processing to create your various counters. 

 

data want;
* Load the cid->parent_cid mapping into a hash ;
  if _n_=1 then do;
    declare hash h(dataset:'have');
    h.definekey('cid');
    h.definedata('parent_cid');
    h.definedone();
  end;
* Load next CID value ;
  set have ;
* Find their top ancestor and how deep they are ;
  do depth=1 by 1 until(h.find());
    top=cid;
    cid=parent_cid;
  end;
* Re-load the same CID value to reset CID and PARENT_CID ;
  set have;
run;

proc sort;
  by top depth cid ;
run;

* find how many levels will be needed ;
proc sql noprint;
  select max(depth) into :md trimmed from want;
quit;

* Use FIRST. processing to get the counters ;
* Concatenate to make index string ;
data numbered;
  set want;
  by top depth cid;
  array c[&md] ;
  c[depth]+1;
  if first.depth then do;
     do j=depth+1 to &md;
       c[j]=1;
     end;
  end;
  drop j ;
  sort_chain=catx('.',of c[*]);
run;

Results

Tom_0-1732134842977.png

NOTE:   If none of the individual counters exceed 9 then you can generate your sort key variable just using CATX() as in this program.  But if you do have values like 10 then you might want to code a loop so you can use the Z format to generate strings like 1.10.1 and 1.09.1 instead.  Those will sort properly.

quickbluefish
Barite | Level 11
That's really interesting - I understand the code more or less (I've never set the same dataset twice -seems much less awkward than having to store the original values in separate variables) and understand how you're deriving the sort chain now, but I still don't think I follow what the purpose of it is - like, what is "1.3.1" actually telling us as it relates to CID='d', parent_CID='a'? Doesn't 'd' have a depth of 2?
Tom
Super User Tom
Super User

That step reads the same dataset three times since it is also loaded into the HASH.

 

If we stick to the parent/child analogy we have A as the head of the first family, so 1 for the first place.  D is the second child of A, so since A was number 1 for that second position D is number 3.  And since the depth of D is two you have 1 for all of the other levels since D is the head that sub-tree.

 

quickbluefish
Barite | Level 11
Mind blown... thank you!
RVO
Fluorite | Level 6 RVO
Fluorite | Level 6

Hi Tom,

 

Once again thank you for thinking along!

The resulting dataset looks almost right, however what does not seem correct is obs. 4: this should be 1.2.2, not 1.3.2. Also, the list needs the 'top' column to show 'b', and who are related to 'b' (which are 'a' and 'c').

I will give a full explanation why the format is needed, starting from my final table in Viya, working backwards towards the dataset I have.

My client wants to have an overview of all companies within a certain sector (these are 1000's of companies, so to answer your question: yes, many values will be larger than 10). These companies are related to each other (holding company, daughter companies, branches, etc.). A company can have multiple daughters/branches, but each company only one parent. In the dashboard, these companies need to be shown with their relation to each other in a listtable. Each company is operating in certain sub-sectors. So if the user selects a certain sub-sector, a subset of the companies is returned. BUT, and this is the important part: each company that is returned, must have all the parent/daughter (NOT sister) companies displayed, to understand where in the global sector a company operates.

 

The solution to this is to send a table like this to Viya*:

(note the spaces in front of sort_name, this is derived from the 'level' column in EG, level 1 = no spaces, level 2 = 2 spaces, level 3 = 4 spaces, etc.).

RVO_1-1732178490215.png

Note that each company (in another table) is linked to sub-sectors via datamapping.

I will use an excel pivot table to mimic the Viya behavior. The following listtable is created in Viya, normally the 'sort chain' column is hidden (now shown for clarity). The result would be:

RVO_3-1732178611121.png

 

Say the user select subsector 1. And companies 'b' and 'f' are active in that sector. Then the result would be:

RVO_4-1732178822468.png

 

'e' dropped out, nothing to do with the subsector. But 'a', 'c' and 'g' are shown, because they are related to a company in the chain. Note: in the final version, 'b' and 'f' will be highlighted to show they are the direct result of the filter, while the rest is to show the company structure within the sector (I have that part working if I load the desired output table into Viya when manually created from a small subset of the data).

 

The whole reason that the above table works, is because:

  1. I have the sort-chain column.
  2. I have the level-column to create spaces in front of c company name to make the result human-readable.
  3. And most important: for each company in the list all parents and children are in the table.

 

To further clarify point 3, while using my original example table in my original post:

Company 'a' is linked to 3 other companies. Hence, for cid 'a', we get sort_cid 'a' (itself), 'b' (first daughter), 'c' (first daughter of first daughter), 'd' (second daughter).

Company 'b'  is linked to 2 other companies, being 'a' (its parent), and 'c' (its daughter). So 'b' inherits the first digit from 'a', and digit two gets a +1.

Company 'd' is linked to 1 other company, being 'a' (its parent). So 'd' inherits the first digit from 'a' and digit two gets a +1 (i.e. 3).

 

I have also created the below identical working example, but have reduces all sort_chain numbers by 1, maybe that is more logical for some people:

Data sent to Viya:

RVO_5-1732179628978.png

 

Basic Viya listtable:

RVO_6-1732179635335.png

 

User selects companies 'b' and 'f'

RVO_7-1732179641824.png

 

 

Tom
Super User Tom
Super User

So you just want to add extra observations?

RVO
Fluorite | Level 6 RVO
Fluorite | Level 6
Yes, there will be additional observations (1000's of companies). And chains can be long: a company can have many children, grandchildren, grand-grandchildren, etc. (10+ levels deep).
Tom
Super User Tom
Super User

I do not see how yours is much different than what I produced.

(Hard to use your photographs since I cannot copy the text from a photograph).

 

It looks like you just want to add the extra observations to replicate the parents that have many children.  So instead of just having A 1.1.1 a single time you want it replicated.

 

Tom_0-1732196134326.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 22 replies
  • 2606 views
  • 3 likes
  • 4 in conversation