Hello! I have a dataset with several entities that belong to a group and I would like to assign a value to each entity to reflect their position within the group (e.g. 1 for the mother company, 2 for the daughters and so on). I would need a loop to cover all the records in the dataset. I can have multiple datasets so I would need to write code that could be applicable to different cases. Example:
Initial data set:
Company | Direct_parent | Group_parent |
A | E | E |
B | A | E |
C | E | E |
D | B | E |
E | E | |
F | B | E |
G | E | E |
H | A | E |
I | D | E |
J | D | E |
Desired result:
Company | Direct_parent | Group_parent | hierarchical_order |
A | E | E | 2 |
B | A | E | 3 |
C | E | E | 2 |
D | B | E | 4 |
E | E | 1 | |
F | B | E | 4 |
G | E | E | 2 |
H | A | E | 3 |
I | D | E | 5 |
J | D | E | 5 |
My problem is that the initial data set can vary. It may have 10 companies or 50. So I was trying to build a one size fits all code.
Any help on how to do this?
Many thanks!
Matos
Try this one:
EDITED
data have;
infile cards expandtabs;
input (Company Direct_parent Group_parent ) ($);
rename Company=_end Direct_parent=_start;
cards;
O . O
H O O
J O O
X O O
A H O
C H O
M X O
E A O
F A O
Z M O
K E O
J E O
I K O
N I O
D I O
G I O
L I O
P G O
V P O
Q P O
R P O
;
run;
%let start=O;
data want(keep=path);
if _n_ eq 1 then do;
length path _path $ 800 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set have(where=(_start="&start"));
count=1;n=1;_n=1;
path=catx('|',_start,_end);
*putlog 'WARNING:Found ' _end;
pa.add();output;
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find(); /* if rc ne 0 then output; */
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
*putlog 'WARNING:Found ' _end;
count+1;n=count;
path=catx('|',path,_end);
pa.add(); output;
path=_path;
end; /* else output; It is a circle.*/
rc=ha.find_next();
end;
end;
pa.clear();
run;
data key;
set want;
do order=1 to countw(path);
key=scan(path,order,'|');lag_key=lag(key);
if order=1 then lag_key=' ';output;
end;
drop path;
run;
data final_want;
if _n_=1 then do;
if 0 then set key;
declare hash h(dataset:'key');
h.definekey('lag_key','key');
h.definedata('order');
h.definedone();
end;
set have;
call missing(order);
rc=h.find(key:_start,key:_end);
drop rc key lag_key;
run;
Welcome to the SAS community 🙂
Can you be a bit more specific about the logic here? Why is hierarchical_order equal to 2 in the first row? And 3 in the second?
Sure! In my example the logic is the following:
Company E is at the top since Group_Parent = E. Thus the hierarchical_order assigned to Company E is 1 (first level of the organization).
Company A, C and G are owned directly by Company E (which is at first level). Thus the hierarchical_order assigned to them is 2 (second level of the organization).
Company B and H are owned directly by Company A (which is at the second level). Thus the hierarchical_order assigned to them is 3 (third level of the organization). And so on.
This would be the organizational chart:
The hierarchical order assigned to a company will be dependent from the hierarchical order assigned to its direct parent company.
looking at your hierarchical order company ownership has nothing to do with the decision making process.
if company = 'E' then hierarchical_order = 1;
else if company in ('A','C','G') then hierarchical_order = 2;
else if company in ('B','H') then hierarchical_order = 3;
else if company in ('D','F'') then hierarchical_order = 4;
else if company in ('J','I') then hierarchical_order = 5;
else hierarchical_order = .;
You could also use a format statement to assign the hierarchical_order
Many thanks. However, this only works for this specific dataset. My problem is that my dataset is not stable since it is a result of a previous query that I need to run multiple times. As such it is dynamic and I don't know what will be the ownership structure. I will have to deal with datasets with 10 different company names or even 60 different company names. Some could end up to 20 hierachical order. Is it possible to define a code that could suit to different datasets, with different sizes?
Another example: here Company E has been replaced by Company O.
Company | Direct_parent | Group_parent | hierarchical_order |
O | O | ||
H | O | O | |
J | O | O | |
X | O | O | |
A | H | O | |
C | H | O | |
M | X | O | |
E | A | O | |
F | A | O | |
Z | M | O | |
K | E | O | |
J | E | O | |
I | K | O | |
N | I | O | |
D | I | O | |
G | I | O | |
L | I | O | |
P | G | O | |
V | P | O | |
Q | P | O | |
R | P | O |
based on the 2 samples that @Matos has provided I do not see any relationship in the samples. My reasoning is because A is a child to E in sample one, then in sample 2 E is the child to A. If A is representing the same company in both samples then maybe you want to rethink the process and setup multipedal formats that you would call append based on the incoming dataset.
Sound like you want to depth of the node in a tree search.
Do you have SAS/OR licensed? Can you run PROC NETDRAW?
Try this one:
EDITED
data have;
infile cards expandtabs;
input (Company Direct_parent Group_parent ) ($);
rename Company=_end Direct_parent=_start;
cards;
O . O
H O O
J O O
X O O
A H O
C H O
M X O
E A O
F A O
Z M O
K E O
J E O
I K O
N I O
D I O
G I O
L I O
P G O
V P O
Q P O
R P O
;
run;
%let start=O;
data want(keep=path);
if _n_ eq 1 then do;
length path _path $ 800 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set have(where=(_start="&start"));
count=1;n=1;_n=1;
path=catx('|',_start,_end);
*putlog 'WARNING:Found ' _end;
pa.add();output;
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find(); /* if rc ne 0 then output; */
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
*putlog 'WARNING:Found ' _end;
count+1;n=count;
path=catx('|',path,_end);
pa.add(); output;
path=_path;
end; /* else output; It is a circle.*/
rc=ha.find_next();
end;
end;
pa.clear();
run;
data key;
set want;
do order=1 to countw(path);
key=scan(path,order,'|');lag_key=lag(key);
if order=1 then lag_key=' ';output;
end;
drop path;
run;
data final_want;
if _n_=1 then do;
if 0 then set key;
declare hash h(dataset:'key');
h.definekey('lag_key','key');
h.definedata('order');
h.definedone();
end;
set have;
call missing(order);
rc=h.find(key:_start,key:_end);
drop rc key lag_key;
run;
It works perfectly! Many thanks!
You need to change
%let start=E;
According to Group_parent .
For the another example.
data have;
infile cards expandtabs;
input (Company Direct_parent Group_parent ) ($);
rename Company=_end Direct_parent=_start;
cards;
A E E
B A E
C E E
D B E
E . E
F B E
G E E
H A E
I D E
J D E
;
run;
%let start=E; /***<---------****/
data want(keep=path);
if _n_ eq 1 then do;
length path _path $ 800 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set have(where=(_start="&start"));
count=1;n=1;_n=1;
path=catx('|',_start,_end);
*putlog 'WARNING:Found ' _end;
pa.add();output;
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find(); /* if rc ne 0 then output; */
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
*putlog 'WARNING:Found ' _end;
count+1;n=count;
path=catx('|',path,_end);
pa.add(); output;
path=_path;
end; /* else output; It is a circle.*/
rc=ha.find_next();
end;
end;
pa.clear();
run;
data key;
set want;
do order=1 to countw(path);
key=scan(path,order,'|');lag_key=lag(key);
if order=1 then lag_key=' ';output;
end;
drop path;
run;
data final_want;
if _n_=1 then do;
if 0 then set key;
declare hash h(dataset:'key');
h.definekey('lag_key','key');
h.definedata('order');
h.definedone();
end;
set have;
call missing(order);
rc=h.find(key:_start,key:_end);
drop rc key lag_key;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.