Solved: transpose and concat

frupaul · Posted 09-07-2018 05:31 PM

Hi Everyone,

A bit of an interesting one. I have the below table:

Source code:

data Hierarchy;
infile datalines dlm='' truncover;
length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6 $20 L6_Name $20;
input  L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $;
cards;
101 Anne 125 David 202 James 312 Anthony 404 Kiki
101 Anne 123 Chun 202 James 345 Judas 505 Drake
101 Anne 125 David 231 Chris 345 Judas 542 Alice
101 Anne 123 Chun 231 Chris 391 Jacob
101 Anne 123 Chun 234 Atem
101 Anne 125 David 239 Lucas
101 Anne 125 David 266 Tim
101 Anne 125 David 231 Chris
;

Context:

The key to unlocking this question lies in the last level of every row i.e. the rightmost level present.

- The level above the last level in every case is the manager. e.g. on row one, Anthony (at L4) is the Manager of Kiki (at L5) and at the last row, David (at L2) is the Manager of Chris (at L3)

Objective:

Objective is to create a new variable called key. The value assigned to key is:

- a concatenation of all the lowest (leftmost) levels assigned to a manager. e.g. take a look at obs 2 and 3, Judas is the Manager of Drake and Alice. So the value of key for these two observations will be "Drake|Alice".

Another example, for observations 6 to 8, David is the manager of Lucas, Chris and Tim. For each of these, the value of key will be "Lucas|Tim|Chris"(OR

- Where only one staff is assigned to a manager (e.g at obs 1,4 and 5) the value assigned to key is that staff's name. e.g the value of Key for obs 1 is "kiki"

Expected result:

Please can anyone help?

Thanks

ballardw · Posted 09-07-2018 08:01 PM

This will create the key variables;

data work.keys;
   set work.temp;
   by super;
   /* this creates maximum of 10 keys, replace with a known max if different*/
   array key{10} $ 20; 
   retain count key: ;
   if first.super then do;
      call missing (of key(*));
      count=1;
   end;
   else count+1;
   key[count]=last;
   if last.super;
   keep super key: ;
run;

I'll leave the modification of the Proc SQL to an interested reader. Really pretty trivial adding all key variables instead of a single one.

View solution in original post

PGStats · Posted 09-07-2018 06:23 PM

Try this:

data Hierarchy;
infile datalines dlm='' truncover;
length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6_ID $20 L6_Name $20;
input  L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $;
cards;
101 Anne 125 David 202 James 312 Anthony 404 Kiki
101 Anne 123 Chun 202 James 345 Judas 505 Drake
101 Anne 125 David 231 Chris 345 Judas 542 Alice
101 Anne 123 Chun 231 Chris 391 Jacob
101 Anne 123 Chun 234 Atem
101 Anne 125 David 239 Lucas
101 Anne 125 David 266 Tim
101 Anne 125 David 231 Chris
;

data temp;
set Hierarchy;
row = _n_;
array m L1_ID -- L6_Name;
length Man Emp $20;
do i = dim(m) to 2 by -2 while(missing(Man));
    if missing(Emp) then Emp = m{i};
    else Man = catx("-", m{i-1}, m{i});
    end;;
drop i;
run;

proc sort data=temp(keep=Man Emp) out=ManEmp; by Man Emp; run;

data keys;
length key $60;
do until(last.Man);
    set ManEmp; by Man;
    key = catx("|", key, Emp);
    end;
drop Emp;
run;

proc sql;
create table want(drop=row Man) as
select keys.key, temp.*
from temp, keys
where temp.Man = Keys.Man
order by row;
quit;

PG

ballardw · Posted 09-07-2018 06:49 PM

Here's one way for the sample data.

Caveats: if any of your Names contain spaces then the calls to the countw function should explicitly limit the delimiter, if your names have commas, use a different character as the delimiter in the catx function call and then use that as the delimeter for countw.

Depending on the actual number of potential names in the key the length may need to be adjusted. If you actually have 20 character names then the length of key should be at least 20*(max number of expected supervisees)+(number of supervisees -1).

This does not deal at all with cases of the same name but different person. It really would make a tad more sensi to use the ID values instead of the name. If there are potentially a "125 David" and "456 David" then create a variable to combine the ID and Name to use for the potential "super" values in my work.temp. The same thing applies for the "key". What do you want if you have multiple supervisees as in :

Judas 505 Drake

Judas 542 Alice

Judas 588 Alice

data Hierarchy;
infile datalines dlm='' truncover;
length L1_ID $20 L1_Name $20 L2_ID $20 L2_Name $20 L3_ID $20 L3_Name $20 L4_ID $20 L4_Name $20 L5_ID $20 L5_Name $20 L6 $20 L6_Name $20;
input  L1_ID $ L1_Name $ L2_ID $ L2_Name $ L3_ID $ L3_Name $ L4_ID $ L4_Name $ L5_ID $ L5_Name $ L6_ID $ L6_Name $;
cards;
101 Anne 125 David 202 James 312 Anthony 404 Kiki
101 Anne 123 Chun 202 James 345 Judas 505 Drake
101 Anne 125 David 231 Chris 345 Judas 542 Alice
101 Anne 123 Chun 231 Chris 391 Jacob
101 Anne 123 Chun 234 Atem
101 Anne 125 David 239 Lucas
101 Anne 125 David 266 Tim
101 Anne 125 David 231 Chris
;
run;

data work.temp;
   set Hierarchy;
   array nm L1_Name L2_Name L3_Name L4_Name L5_Name L6_Name;
   super= nm[countw(catx(',',of nm(*)))-1];
   last = nm[countw(catx(',',of nm(*))) ];
run;
proc sort data=work.temp;
   by super last;
run;

data work.keys;
   set work.temp;
   by super;
   length key $ 60;
   retain key;
   if first.super then key=last;
   else key=catx('|',key,last);
   if last.super;
   keep super key;
run;

proc sql;
   create table want as
   select b.key,  a.L1_ID, a.L1_Name, a.L2_ID, a.L2_Name, a.L3_ID
        , a.L3_Name, a.L4_ID, a.L4_Name, a.L5_ID, a.L5_Name, a.L6_ID, a.L6_Name 
   from work.temp as a
        left join
        work.keys as b
        on a.super = b.super
   ;
quit;

frupaul · Posted 09-07-2018 07:21 PM

Hi Ballardw,

Thanks for your suggestions. This is actually an oversimplification. I just thought people relate better to names than IDs hence why i went for names. However I have ensured there aren't any multiple names with the same ID.

Your solution works. But can i ask for a slight adjustment? Instead of assigning the concatenation to Key, can you create new variables Key1, Key2, Key3 .....Keyn to hold the individual values of those names e.g last three observations will be displayed as below (just an excerpt of original).

However, where there is a one to one relationship between manager and staff, the staff name will be assigned to key1.

Thanks,

ballardw · Posted 09-07-2018 08:01 PM

This will create the key variables;

data work.keys;
   set work.temp;
   by super;
   /* this creates maximum of 10 keys, replace with a known max if different*/
   array key{10} $ 20; 
   retain count key: ;
   if first.super then do;
      call missing (of key(*));
      count=1;
   end;
   else count+1;
   key[count]=last;
   if last.super;
   keep super key: ;
run;

I'll leave the modification of the Proc SQL to an interested reader. Really pretty trivial adding all key variables instead of a single one.

transpose and concat

Re: transpose and concat

Re: transpose and concat

Re: transpose and concat

Re: transpose and concat

Re: transpose and concat

transpose and concat

Re: transpose and concat

Re: transpose and concat

Re: transpose and concat

Re: transpose and concat

Re: transpose and concat

Click image to register for webinar

Classroom Training Available!