I'm stuck trying to go from this table:
cid | parent_cid |
a | |
b | a |
c | b |
d | a |
e | |
f | |
g | f |
To this table:
cid | sort_cid | sort_chain | level |
a | a | 1.1.1 | 1 |
a | b | 1.2.1 | 2 |
a | c | 1.2.2 | 3 |
a | d | 1.3.1 | 2 |
b | a | 1.1.1 | 1 |
b | b | 1.2.1 | 2 |
b | c | 1.2.2 | 3 |
c | a | 1.1.1 | 1 |
c | b | 1.2.1 | 2 |
c | c | 1.2.2 | 3 |
d | a | 1.1.1 | 1 |
d | d | 1.3.1 | 2 |
e | e | 2.1.1 | 1 |
f | f | 3.1.1 | 1 |
f | g | 3.2.1 | 2 |
g | f | 3.1.1 | 1 |
g | g | 3.2.1 | 2 |
In short, I want for every cid every parent and child cid in the 'chain' as sort_cid. Next to that, for every sort_cid I need a dot-separated chain of numbers, to be able to sort the sort_cid in a Viya Dashboard. I also need to know which 'level' each sort_cid is, e.g. if it is the main parent (level 1) or it has two parents (level 3).
Any help would be much appreciated!
Why is level = 2 in the 4th obs?
Because a is the parent of d. So d is the child of a. Since a = level 1, then d = level 2. Another way to see it: b and d are siblings :). This can be deducted from level = 2.
What I want to achieve:
I want to use the level-column to put spaces in front of the cid (which in my real data is a company name), together with the sort_chain to sort the data in viya. My final result in Viya should look like this:
a
b
c
d
e
f
g
In Viya I want to select a company (using cid), and then want to display the full company-chain to the user.
So if a user selects 'a', they should see this in Viya in a list table:
a
b
c
d
The sort_chain is used to make sure the results are always displayed in this order.
I do not get your output. But here is a simple way you can load the parent/child pairs into a hash object and use that to generate a list of each person and which family and what level within they family they are.
data have ;
input cid $ parent_cid $;
cards;
a .
b a
c b
d a
e .
f .
g f
;
data want;
set have ;
where missing(parent_cid);
family+1;
if _n_=1 then do;
declare hash h(multidata:'yes',dataset:'have(where=(not missing(parent_cid)))');
h.definekey('parent_cid');
h.definedata('cid');
h.definedone();
end;
do generation=1 by 1 until(h.find());
output;
parent_cid=cid;
end;
run;
Result
parent_ Obs cid cid family generation 1 a 1 1 2 b a 1 2 3 c b 1 3 4 e 2 1 5 f 3 1 6 g f 3 2
Hi Tom,
Thank you for taking the time to help me out!
I have replied above to another comment to further explain my output (and why it is needed).
I have no experience with hash-objects, so I do not fully understand the workings of the code.
The output you have generates is a step in the right direction, but record [d a] is missing. I do not fully understand from the code why that would happen (unless a hash in SAS is the equivalent of a dictionary in Python? Then one cannot have duplicate keys).
Hope this together with my other comment clarifies it a bit. I understand the output might be 'strange', but Viya is a strange program to work with, so sometimes unconventional tables must be loaded into it to create the output a user wants 😉
I think my brain melted trying to understand the 'sort_chain' variable, but the following code produces the rest of them in the format you specified - I think I'm just not understanding how to derive sort_chain from this.
data have;
infile cards dsd truncover firstobs=1 dlm=',';
length cid parent_cid $1;
input cid parent_cid;
cards;
a,
b,a
c,b
d,a
e,
f,
g,f
;
run;
proc sql noprint;
create table c2num as
select 'fc2num' as fmtname length=8,
'C' as type length=1, cid as start, monotonic() as label
from have;
quit;
data num2c (rename=(_start=start _label=label));
set c2num (rename=(start=_label label=_start));
fmtname='fnum2c';
type='N';
run;
data c2num;
set c2num end=last;
output;
n+1;
if last then do;
start=' ';
label=0;
output;
call symputx("ncids",n);
end;
drop n;
run;
proc format cntlin=c2num; run;
proc format cntlin=num2c; run;
data want;
set have (rename=(cid=_CID parent_CID=_PC)) end=last;
array nparents {0:&ncids} _temporary_ (%eval(&ncids+1)*0);
array p {&ncids} _temporary_;
nparents[put(_CID,$fc2num.)*1]+nparents[put(_PC,$fc2num.)*1]+1;
p[put(_CID,$fc2num.)*1]=put(_PC,$fc2num.)*1;
if last then do;
do i=1 to dim(p);
cid=put(i,fnum2c.);
sort_cid=cid;
level=nparents[i];
output;
do j=1 to dim(p);
if p[j]=i then do;
sort_cid=put(j,fnum2c.);
level=nparents[j];
output;
do k=1 to dim(p);
if p[k]=j then do;
sort_cid=put(k,fnum2c.);
level=nparents[k];
output;
end;
end;
end;
end;
level=nparents[i];
_i=i;
do while (level>1);
sort_cid=put(p[_i],fnum2c.);
level=nparents[p[_i]];
output;
_i=p[_i];
end;
end;
end;
keep cid sort_cid level;
run;
proc sort data=want; by cid sort_cid; run;
proc print data=want; run;
So you just need to calculate the depth for each CID and which CID is at the top of their tree. Then you can use BY group processing to create your various counters.
data want;
* Load the cid->parent_cid mapping into a hash ;
if _n_=1 then do;
declare hash h(dataset:'have');
h.definekey('cid');
h.definedata('parent_cid');
h.definedone();
end;
* Load next CID value ;
set have ;
* Find their top ancestor and how deep they are ;
do depth=1 by 1 until(h.find());
top=cid;
cid=parent_cid;
end;
* Re-load the same CID value to reset CID and PARENT_CID ;
set have;
run;
proc sort;
by top depth cid ;
run;
* find how many levels will be needed ;
proc sql noprint;
select max(depth) into :md trimmed from want;
quit;
* Use FIRST. processing to get the counters ;
* Concatenate to make index string ;
data numbered;
set want;
by top depth cid;
array c[&md] ;
c[depth]+1;
if first.depth then do;
do j=depth+1 to &md;
c[j]=1;
end;
end;
drop j ;
sort_chain=catx('.',of c[*]);
run;
Results
NOTE: If none of the individual counters exceed 9 then you can generate your sort key variable just using CATX() as in this program. But if you do have values like 10 then you might want to code a loop so you can use the Z format to generate strings like 1.10.1 and 1.09.1 instead. Those will sort properly.
That step reads the same dataset three times since it is also loaded into the HASH.
If we stick to the parent/child analogy we have A as the head of the first family, so 1 for the first place. D is the second child of A, so since A was number 1 for that second position D is number 3. And since the depth of D is two you have 1 for all of the other levels since D is the head that sub-tree.
Hi Tom,
Once again thank you for thinking along!
The resulting dataset looks almost right, however what does not seem correct is obs. 4: this should be 1.2.2, not 1.3.2. Also, the list needs the 'top' column to show 'b', and who are related to 'b' (which are 'a' and 'c').
I will give a full explanation why the format is needed, starting from my final table in Viya, working backwards towards the dataset I have.
My client wants to have an overview of all companies within a certain sector (these are 1000's of companies, so to answer your question: yes, many values will be larger than 10). These companies are related to each other (holding company, daughter companies, branches, etc.). A company can have multiple daughters/branches, but each company only one parent. In the dashboard, these companies need to be shown with their relation to each other in a listtable. Each company is operating in certain sub-sectors. So if the user selects a certain sub-sector, a subset of the companies is returned. BUT, and this is the important part: each company that is returned, must have all the parent/daughter (NOT sister) companies displayed, to understand where in the global sector a company operates.
The solution to this is to send a table like this to Viya*:
(note the spaces in front of sort_name, this is derived from the 'level' column in EG, level 1 = no spaces, level 2 = 2 spaces, level 3 = 4 spaces, etc.).
Note that each company (in another table) is linked to sub-sectors via datamapping.
I will use an excel pivot table to mimic the Viya behavior. The following listtable is created in Viya, normally the 'sort chain' column is hidden (now shown for clarity). The result would be:
Say the user select subsector 1. And companies 'b' and 'f' are active in that sector. Then the result would be:
'e' dropped out, nothing to do with the subsector. But 'a', 'c' and 'g' are shown, because they are related to a company in the chain. Note: in the final version, 'b' and 'f' will be highlighted to show they are the direct result of the filter, while the rest is to show the company structure within the sector (I have that part working if I load the desired output table into Viya when manually created from a small subset of the data).
The whole reason that the above table works, is because:
To further clarify point 3, while using my original example table in my original post:
Company 'a' is linked to 3 other companies. Hence, for cid 'a', we get sort_cid 'a' (itself), 'b' (first daughter), 'c' (first daughter of first daughter), 'd' (second daughter).
Company 'b' is linked to 2 other companies, being 'a' (its parent), and 'c' (its daughter). So 'b' inherits the first digit from 'a', and digit two gets a +1.
Company 'd' is linked to 1 other company, being 'a' (its parent). So 'd' inherits the first digit from 'a' and digit two gets a +1 (i.e. 3).
I have also created the below identical working example, but have reduces all sort_chain numbers by 1, maybe that is more logical for some people:
Data sent to Viya:
Basic Viya listtable:
User selects companies 'b' and 'f'
So you just want to add extra observations?
I do not see how yours is much different than what I produced.
(Hard to use your photographs since I cannot copy the text from a photograph).
It looks like you just want to add the extra observations to replicate the parents that have many children. So instead of just having A 1.1.1 a single time you want it replicated.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.