Hi Everyone,
A bit of an interesting one. I have the below table:
Source code:
data Hierarchy;
infile datalines dlm='' truncover;
length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6 $20 L6_Name $20;
input L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $;
cards;
101 Anne 125 David 202 James 312 Anthony 404 Kiki
101 Anne 123 Chun 202 James 345 Judas 505 Drake
101 Anne 125 David 231 Chris 345 Judas 542 Alice
101 Anne 123 Chun 231 Chris 391 Jacob
101 Anne 123 Chun 234 Atem
101 Anne 125 David 239 Lucas
101 Anne 125 David 266 Tim
101 Anne 125 David 231 Chris
;
Context:
The key to unlocking this question lies in the last level of every row i.e. the rightmost level present.
- The level above the last level in every case is the manager. e.g. on row one, Anthony (at L4) is the Manager of Kiki (at L5) and at the last row, David (at L2) is the Manager of Chris (at L3)
Objective:
Objective is to create a new variable called key. The value assigned to key is:
- a concatenation of all the lowest (leftmost) levels assigned to a manager. e.g. take a look at obs 2 and 3, Judas is the Manager of Drake and Alice. So the value of key for these two observations will be "Drake|Alice".
Another example, for observations 6 to 8, David is the manager of Lucas, Chris and Tim. For each of these, the value of key will be "Lucas|Tim|Chris"(OR
- Where only one staff is assigned to a manager (e.g at obs 1,4 and 5) the value assigned to key is that staff's name. e.g the value of Key for obs 1 is "kiki"
Expected result:
Please can anyone help?
Thanks
This will create the key variables;
data work.keys; set work.temp; by super; /* this creates maximum of 10 keys, replace with a known max if different*/ array key{10} $ 20; retain count key: ; if first.super then do; call missing (of key(*)); count=1; end; else count+1; key[count]=last; if last.super; keep super key: ; run;
I'll leave the modification of the Proc SQL to an interested reader. Really pretty trivial adding all key variables instead of a single one.
Try this:
data Hierarchy;
infile datalines dlm='' truncover;
length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6_ID $20 L6_Name $20;
input L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $;
cards;
101 Anne 125 David 202 James 312 Anthony 404 Kiki
101 Anne 123 Chun 202 James 345 Judas 505 Drake
101 Anne 125 David 231 Chris 345 Judas 542 Alice
101 Anne 123 Chun 231 Chris 391 Jacob
101 Anne 123 Chun 234 Atem
101 Anne 125 David 239 Lucas
101 Anne 125 David 266 Tim
101 Anne 125 David 231 Chris
;
data temp;
set Hierarchy;
row = _n_;
array m L1_ID -- L6_Name;
length Man Emp $20;
do i = dim(m) to 2 by -2 while(missing(Man));
if missing(Emp) then Emp = m{i};
else Man = catx("-", m{i-1}, m{i});
end;;
drop i;
run;
proc sort data=temp(keep=Man Emp) out=ManEmp; by Man Emp; run;
data keys;
length key $60;
do until(last.Man);
set ManEmp; by Man;
key = catx("|", key, Emp);
end;
drop Emp;
run;
proc sql;
create table want(drop=row Man) as
select keys.key, temp.*
from temp, keys
where temp.Man = Keys.Man
order by row;
quit;
Here's one way for the sample data.
Caveats: if any of your Names contain spaces then the calls to the countw function should explicitly limit the delimiter, if your names have commas, use a different character as the delimiter in the catx function call and then use that as the delimeter for countw.
Depending on the actual number of potential names in the key the length may need to be adjusted. If you actually have 20 character names then the length of key should be at least 20*(max number of expected supervisees)+(number of supervisees -1).
This does not deal at all with cases of the same name but different person. It really would make a tad more sensi to use the ID values instead of the name. If there are potentially a "125 David" and "456 David" then create a variable to combine the ID and Name to use for the potential "super" values in my work.temp. The same thing applies for the "key". What do you want if you have multiple supervisees as in :
Judas 505 Drake
Judas 542 Alice
Judas 588 Alice
data Hierarchy; infile datalines dlm='' truncover; length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6 $20 L6_Name $20; input L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $; cards; 101 Anne 125 David 202 James 312 Anthony 404 Kiki 101 Anne 123 Chun 202 James 345 Judas 505 Drake 101 Anne 125 David 231 Chris 345 Judas 542 Alice 101 Anne 123 Chun 231 Chris 391 Jacob 101 Anne 123 Chun 234 Atem 101 Anne 125 David 239 Lucas 101 Anne 125 David 266 Tim 101 Anne 125 David 231 Chris ; run; data work.temp; set Hierarchy; array nm L1_Name L2_Name L3_Name L4_Name L5_Name L6_Name; super= nm[countw(catx(',',of nm(*)))-1]; last = nm[countw(catx(',',of nm(*))) ]; run; proc sort data=work.temp; by super last; run; data work.keys; set work.temp; by super; length key $ 60; retain key; if first.super then key=last; else key=catx('|',key,last); if last.super; keep super key; run; proc sql; create table want as select b.key, a.L1_ID, a.L1_Name, a.L2_ID, a.L2_Name, a.L3_ID , a.L3_Name, a.L4_ID, a.L4_Name, a.L5_ID, a.L5_Name, a.L6_ID, a.L6_Name from work.temp as a left join work.keys as b on a.super = b.super ; quit;
Hi Ballardw,
Thanks for your suggestions. This is actually an oversimplification. I just thought people relate better to names than IDs hence why i went for names. However I have ensured there aren't any multiple names with the same ID.
Your solution works. But can i ask for a slight adjustment? Instead of assigning the concatenation to Key, can you create new variables Key1, Key2, Key3 .....Keyn to hold the individual values of those names e.g last three observations will be displayed as below (just an excerpt of original).
However, where there is a one to one relationship between manager and staff, the staff name will be assigned to key1.
Thanks,
This will create the key variables;
data work.keys; set work.temp; by super; /* this creates maximum of 10 keys, replace with a known max if different*/ array key{10} $ 20; retain count key: ; if first.super then do; call missing (of key(*)); count=1; end; else count+1; key[count]=last; if last.super; keep super key: ; run;
I'll leave the modification of the Proc SQL to an interested reader. Really pretty trivial adding all key variables instead of a single one.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.