BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
frupaul
Quartz | Level 8

Hi Everyone,

 

A bit of an interesting one. I have the below table:

Screen Shot 2018-09-07 at 22.07.13.png

Source code:

data Hierarchy;
infile datalines dlm='' truncover;
length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6 $20 L6_Name $20;
input  L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $;
cards;
101 Anne 125 David 202 James 312 Anthony 404 Kiki
101 Anne 123 Chun 202 James 345 Judas 505 Drake
101 Anne 125 David 231 Chris 345 Judas 542 Alice
101 Anne 123 Chun 231 Chris 391 Jacob
101 Anne 123 Chun 234 Atem
101 Anne 125 David 239 Lucas
101 Anne 125 David 266 Tim
101 Anne 125 David 231 Chris
;

Context:

The key to unlocking this question lies in the last level of every row i.e. the rightmost level present. 

- The level above the last level in every case is the manager. e.g. on row one, Anthony (at L4) is the Manager of Kiki (at L5) and at the last row, David (at L2) is the Manager of Chris (at L3)

 

Objective:

Objective is to create a new variable called key. The value assigned to key is:

- a concatenation of all the lowest (leftmost) levels assigned to a manager. e.g. take a look at obs 2 and 3, Judas is the Manager of Drake and Alice. So the value of key for these two observations will be "Drake|Alice".

Another example, for observations 6 to 8,  David is the manager of Lucas, Chris and Tim. For each of these, the value of key will be "Lucas|Tim|Chris"(OR

- Where only one staff is assigned to a manager (e.g at obs 1,4 and 5) the value assigned to key is that staff's name. e.g the value of Key for obs 1 is "kiki"

 

Expected result:

Screen Shot 2018-09-07 at 22.25.48.png

 

Please can anyone help?

 

Thanks

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

This will create the key variables;

data work.keys;
   set work.temp;
   by super;
   /* this creates maximum of 10 keys, replace with a known max if different*/
   array key{10} $ 20; 
   retain count key: ;
   if first.super then do;
      call missing (of key(*));
      count=1;
   end;
   else count+1;
   key[count]=last;
   if last.super;
   keep super key: ;
run;

I'll leave the modification of the Proc SQL to an interested reader. Really pretty trivial adding all key variables instead of a single one.

 

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

Try this:

 

data Hierarchy;
infile datalines dlm='' truncover;
length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6_ID $20 L6_Name $20;
input  L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $;
cards;
101 Anne 125 David 202 James 312 Anthony 404 Kiki
101 Anne 123 Chun 202 James 345 Judas 505 Drake
101 Anne 125 David 231 Chris 345 Judas 542 Alice
101 Anne 123 Chun 231 Chris 391 Jacob
101 Anne 123 Chun 234 Atem
101 Anne 125 David 239 Lucas
101 Anne 125 David 266 Tim
101 Anne 125 David 231 Chris
;

data temp;
set Hierarchy;
row = _n_;
array m L1_ID -- L6_Name;
length Man Emp $20;
do i = dim(m) to 2 by -2 while(missing(Man));
    if missing(Emp) then Emp = m{i};
    else Man = catx("-", m{i-1}, m{i});
    end;;
drop i;
run;

proc sort data=temp(keep=Man Emp) out=ManEmp; by Man Emp; run;

data keys;
length key $60;
do until(last.Man);
    set ManEmp; by Man;
    key = catx("|", key, Emp);
    end;
drop Emp;
run;

proc sql;
create table want(drop=row Man) as
select keys.key, temp.*
from temp, keys
where temp.Man = Keys.Man
order by row;
quit;
PG
ballardw
Super User

Here's one way for the sample data.

 

Caveats: if any of your Names contain spaces then the calls to the countw function should explicitly limit the delimiter, if your names have commas, use a different character as the delimiter in the catx function call and then use that as the delimeter for countw.

Depending on the actual number of potential names in the key the length may need to be adjusted. If you actually have 20 character names then the length of key should be at least 20*(max number of expected supervisees)+(number of supervisees -1).

 

This does not deal at all with cases of the same name but different person. It really would make a tad more sensi to use the ID values instead of the name. If there are potentially a "125 David" and "456 David" then create a variable to combine the ID and Name to use for the potential "super" values in my work.temp. The same thing applies for the "key". What do you want if you have multiple supervisees as in :

Judas 505 Drake

Judas 542 Alice

Judas 588 Alice

data Hierarchy;
infile datalines dlm='' truncover;
length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6 $20 L6_Name $20;
input  L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $;
cards;
101 Anne 125 David 202 James 312 Anthony 404 Kiki
101 Anne 123 Chun 202 James 345 Judas 505 Drake
101 Anne 125 David 231 Chris 345 Judas 542 Alice
101 Anne 123 Chun 231 Chris 391 Jacob
101 Anne 123 Chun 234 Atem
101 Anne 125 David 239 Lucas
101 Anne 125 David 266 Tim
101 Anne 125 David 231 Chris
;
run;

data work.temp;
   set Hierarchy;
   array nm L1_Name L2_Name L3_Name L4_Name L5_Name L6_Name;
   super= nm[countw(catx(',',of nm(*)))-1];
   last = nm[countw(catx(',',of nm(*))) ];
run;
proc sort data=work.temp;
   by super last;
run;

data work.keys;
   set work.temp;
   by super;
   length key $ 60;
   retain key;
   if first.super then key=last;
   else key=catx('|',key,last);
   if last.super;
   keep super key;
run;

proc sql;
   create table want as
   select b.key,  a.L1_ID, a.L1_Name, a.L2_ID, a.L2_Name, a.L3_ID
        , a.L3_Name, a.L4_ID, a.L4_Name, a.L5_ID, a.L5_Name, a.L6_ID, a.L6_Name 
   from work.temp as a
        left join
        work.keys as b
        on a.super = b.super
   ;
quit;
frupaul
Quartz | Level 8

Hi Ballardw,

 

Thanks for your suggestions. This is actually an oversimplification. I just thought people relate better to names than IDs hence why i went for names. However I have ensured there aren't any multiple names with the same ID.

 

Your solution works. But can i ask for a slight adjustment? Instead of assigning the concatenation to Key, can you create new variables Key1, Key2, Key3 .....Keyn to hold the individual values of those names e.g last three observations will be displayed as below (just an excerpt of original). 

Screen Shot 2018-09-08 at 00.20.13.png

 

However, where there is a one to one relationship between manager and staff, the staff name will be assigned to key1.

 

Thanks,

 

 

 

 

ballardw
Super User

This will create the key variables;

data work.keys;
   set work.temp;
   by super;
   /* this creates maximum of 10 keys, replace with a known max if different*/
   array key{10} $ 20; 
   retain count key: ;
   if first.super then do;
      call missing (of key(*));
      count=1;
   end;
   else count+1;
   key[count]=last;
   if last.super;
   keep super key: ;
run;

I'll leave the modification of the Proc SQL to an interested reader. Really pretty trivial adding all key variables instead of a single one.

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 3164 views
  • 2 likes
  • 3 in conversation