Version: 7.1 (7.100.0.1966)(64-bit)
This question seems to have been asked many times over, but I am so new to SAS that it's difficult for me to apply solutions that are geared toward even slightly different scenarios.
I would like to convert a spreadsheet of given employee-manager relationships to a data set that gives all employees that report through each employee. Each step down the management pathway would be recorded as a lesser "Levels from Reference". I've attached a sample Excel spreadhseet to illustrate what I'm probably doing a poor job of describing!
Any help would be greatly appreciated!
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
The first step would be to read some data into SAS, likely one set that describes the relationships manager/subordinates and then the employee data. Have you accomplished that?
Thanks for responding--that's good to know about Excel! I pasted the tables that were in the Excel attachment below, hopefully they format properly.
Original Employee-Manager Data | |||
Employee ID | Employee Position | Manager ID | Manager Position |
11111 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
22222 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
33333 | Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
44444 | Asst Mgr Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | Sr Mgr Revenue Mgmt | 55555 | Dir Accounting |
77777 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
88888 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
99999 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
111110 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
122221 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | Mgr Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
144443 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
Converted Employee Hierarchy | |||||
Reference | Levels from Reference | Employee ID | Employee Position | Manager ID | Manager Position |
11111 | 0 | 19448 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
11111 | -1 | 144733 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
11111 | -1 | 1729532 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
11111 | -1 | 7353675 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
22222 | 0 | 144733 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
33333 | 0 | 150599 | Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
44444 | 0 | 1016187 | Asst Mgr Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | 0 | 1558477 | Sr Mgr Revenue Mgmt | 55555 | Dir Accounting |
66666 | -1 | 19448 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
66666 | -1 | 1753292 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
66666 | -1 | 4621165 | Mgr Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
66666 | -2 | 144733 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
66666 | -2 | 150599 | Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 1016187 | Asst Mgr Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 1621911 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 1729532 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
66666 | -2 | 1786326 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 3007598 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 7353675 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
77777 | 0 | 1621911 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
88888 | 0 | 1729532 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
99999 | 0 | 1753292 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
111110 | 0 | 1786326 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
122221 | 0 | 3007598 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | 0 | 4621165 | Mgr Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
133332 | -1 | 150599 | Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | -1 | 1016187 | Asst Mgr Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | -1 | 1621911 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | -1 | 1786326 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | -1 | 3007598 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
144443 | 0 | 7353675 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
I have read the data into SAS using EG's Import Data feature -- that code is below.
DATA WORK.'Example Employee Hierarchy Conve'n; LENGTH 'Employee ID'n 8 'Employee Position'n $ 23 'Manager ID'n 8 'Manager Position'n $ 23 ; FORMAT 'Employee ID'n BEST12. 'Employee Position'n $CHAR23. 'Manager ID'n BEST12. 'Manager Position'n $CHAR23. ; INFORMAT 'Employee ID'n BEST12. 'Employee Position'n $CHAR23. 'Manager ID'n BEST12. 'Manager Position'n $CHAR23. ; INFILE '/work/sas/SAS_EXAMPLE/#LN00020' LRECL=55 ENCODING="LATIN1" TERMSTR=CRLF DLM='7F'x MISSOVER DSD ; INPUT 'Employee ID'n : BEST32. 'Employee Position'n : $CHAR23. 'Manager ID'n : BEST32. 'Manager Position'n : $CHAR23. ; RUN;
Please let me know if there's any more information I could provide that would be helpful!
I'm sorry, I gave a converted table with incorrect employee IDs. The table below has accurate values.
Reference | Levels from Reference | Employee ID | Employee Position | Manager ID | Manager Position |
11111 | 0 | 11111 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
11111 | -1 | 22222 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
11111 | -1 | 88888 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
11111 | -1 | 144443 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
22222 | 0 | 22222 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
33333 | 0 | 33333 | Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
44444 | 0 | 44444 | Asst Mgr Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | 0 | 66666 | Sr Mgr Revenue Mgmt | 55555 | Dir Accounting |
66666 | -1 | 11111 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
66666 | -1 | 99999 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
66666 | -1 | 133332 | Mgr Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
66666 | -2 | 22222 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
66666 | -2 | 33333 | Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 44444 | Asst Mgr Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 77777 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 88888 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
66666 | -2 | 111110 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 122221 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
66666 | -2 | 144443 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
77777 | 0 | 77777 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
88888 | 0 | 88888 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
99999 | 0 | 99999 | Sr Analyst Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
111110 | 0 | 111110 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
122221 | 0 | 122221 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | 0 | 133332 | Mgr Revenue Mgmt | 66666 | Sr Mgr Revenue Mgmt |
133332 | -1 | 33333 | Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | -1 | 44444 | Asst Mgr Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | -1 | 77777 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | -1 | 111110 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
133332 | -1 | 122221 | Sr Analyst Revenue Mgmt | 133332 | Mgr Revenue Mgmt |
144443 | 0 | 144443 | General Clerk | 11111 | Sr Analyst Revenue Mgmt |
What most people in this forum prefer are sample data created via a SAS data step. I've done this now for you.
data WORK.HAVE;
infile datalines dsd truncover;
input Employee_ID:BEST. Employee_Position:$23. Manager_ID:BEST. Manager_Position:$23.;
datalines4;
11111,Sr Analyst Revenue Mgmt,66666,Sr Mgr Revenue Mgmt
22222,General Clerk,11111,Sr Analyst Revenue Mgmt
33333,Analyst Revenue Mgmt,133332,Mgr Revenue Mgmt
44444,Asst Mgr Revenue Mgmt,133332,Mgr Revenue Mgmt
66666,Sr Mgr Revenue Mgmt,55555,Dir Accounting
77777,Sr Analyst Revenue Mgmt,133332,Mgr Revenue Mgmt
88888,General Clerk,11111,Sr Analyst Revenue Mgmt
99999,Sr Analyst Revenue Mgmt,66666,Sr Mgr Revenue Mgmt
111110,Sr Analyst Revenue Mgmt,133332,Mgr Revenue Mgmt
122221,Sr Analyst Revenue Mgmt,133332,Mgr Revenue Mgmt
133332,Mgr Revenue Mgmt,66666,Sr Mgr Revenue Mgmt
144443,General Clerk,11111,Sr Analyst Revenue Mgmt
;;;;
I've created this data in the following way:
/* read excel into SAS dataset HAVE */
options validvarname=v7;
proc import
out=work.have
file="c:\temp\Example Employee Hierarchy Conversion.xlsx"
dbms=xlsx
replace;
range='Example$A2:D14';
getnames=yes;
run;
/* Create SAS datastep from HAVE to create sample data for SAS Forum */
%data2datastep(have,work,c:\temp\sampledata.txt)
Explanation of macro %data2datastep and source code here:
http://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/
(link to source code at bottom but before discussions under the 2nd link above)
That's really handy, thank you, Patrick! Saving this for future reference.
Essentially, if you filter the Reference column for any employee ID, it will give each employee who reports through that Reference employee and the number of levels below that Reference employee each employee is. Maybe it'll help if I walk through a much shorter, specific example.
Take these three employee-manager relationships:
data WORK.HAVE; infile datalines dsd truncover; input Employee_ID:BEST. Manager_ID:BEST.; datalines2; 11111,33333 22222,33333 33333,44444
For each Employee ID, a level 0 reference record is created. Next, the Manager ID column is searched for that reference ID. This will give any direct reports who are -1 levels from the reference. Then, each direct report's ID is searched in the manager ID column. This gives anyone who indirectly reports -2 levels from the reference. After the entire employee hierarchy that reports through the reference is exhausted, the program moves to the next employee to repeat the process.
In the case above, this would be the output.
Reference ID | Levels from Reference ID | Employee ID | Manager ID |
11111 | 0 | 11111 | 33333 |
22222 | 0 | 22222 | 33333 |
33333 | 0 | 33333 | 44444 |
33333 | -1 | 11111 | 33333 |
33333 | -1 | 22222 | 33333 |
Does that make more sense?
I'd probably go for a more traditional way for creating an employee dimension table.
The following code works under the assumption that each employee has only 1 direct line manager. You should also add the director to your initial data.
data WORK.HAVE;
infile datalines dsd truncover;
input Employee_ID:BEST. Employee_Position:$23. Manager_ID:BEST. Manager_Position:$23.;
datalines4;
11111,Sr Analyst Revenue Mgmt,66666,Sr Mgr Revenue Mgmt
22222,General Clerk,11111,Sr Analyst Revenue Mgmt
33333,Analyst Revenue Mgmt,133332,Mgr Revenue Mgmt
44444,Asst Mgr Revenue Mgmt,133332,Mgr Revenue Mgmt
66666,Sr Mgr Revenue Mgmt,55555,Dir Accounting
77777,Sr Analyst Revenue Mgmt,133332,Mgr Revenue Mgmt
88888,General Clerk,11111,Sr Analyst Revenue Mgmt
99999,Sr Analyst Revenue Mgmt,66666,Sr Mgr Revenue Mgmt
111110,Sr Analyst Revenue Mgmt,133332,Mgr Revenue Mgmt
122221,Sr Analyst Revenue Mgmt,133332,Mgr Revenue Mgmt
133332,Mgr Revenue Mgmt,66666,Sr Mgr Revenue Mgmt
144443,General Clerk,11111,Sr Analyst Revenue Mgmt
;;;;
run;
/* work out hierarchy structure and number of levels */
data hierarchy;
if _n_=1 then
do;
if 0 then set have(keep=Employee_ID Manager_ID);
dcl hash mgr(dataset:'have(keep=Employee_ID Manager_ID)');
mgr.defineKey('Employee_ID');
mgr.defineData('Manager_ID');
mgr.defineDone();
dcl hash emp(dataset:'have(keep=Employee_ID Manager_ID)');
emp.defineKey('Manager_ID');
emp.defineData('Employee_ID');
emp.defineDone();
end;
set have(keep=Employee_ID Manager_ID);
/* only create chain for leaf employees (=no direct reports) */
if emp.check(key:Employee_ID)=0 then return;
level=1;
output;
do while( mgr.find(key:Manager_ID) = 0 );
level+1;
output;
end;
run;
/* determine hierarchy level per manager and max. hierarchy levels */
proc sql noprint;
create table hier as
select
manager_id,
max(level) as level
from hierarchy
group by manager_id
;
select cats(max(level)) into :n_hier from hier;
quit;
/* build employee dimension table */
data emp_dim(drop=Manager_ID Manager_Position level);
if _n_=1 then
do;
if 0 then set have hier;
dcl hash hier(dataset:'hier');
hier.defineKey('manager_id');
hier.defineData('level');
hier.defineDone();
dcl hash mgr(dataset:'have(keep=Employee_ID Manager_ID)');
mgr.defineKey('Employee_ID');
mgr.defineData('Manager_ID');
mgr.defineDone();
end;
call missing(of _all_);
set have;
array hiers {&n_hier} 8 level&n_hier - level1;
level0='Company';
if hier.find()=0 then hiers[level]=Manager_ID;
do while( mgr.find(key:Manager_ID) = 0 );
if hier.find()=0 then hiers[level]=Manager_ID;
end;
run;
Result:
@Patrick Thanks again for taking the time to be incredibly helpful! This style of output seems like it would work for identifying the reporting chain both upward and downward, which I like.
I'm trying to think through how this output would work for my purposes--which is to combine this hierarchy data with other data in Tableau for reporting purposes. Say I want to find each employee that reports through manager 66666, would that manager only ever be found in one level (a single column)? Would it be possible to add an identifier, if an employee is a manager, that will give the "level" of manager they are, or is there a better way to programatically find all of the employees that report through a given manager?
I suppose the advantage of the output style I was originally going for is that it's intuitively straight forward to identify all of the employees that report through a given manager and to structure output based on the level difference from manager to each employee, but the style you've come up with is probably traditional for good reason. It would certainly be more compact, especially given a population of 50k+!
"Say I want to find each employee that reports through manager 66666, would that manager only ever be found in one level (a single column)?"
A manager is also an employee. So you first filter employee_id for 66666. The last missing "level" variable then tells you if and on what level this employee is in the role of a manager. Then you filter the records on this level variable (level2 for 66666) which returns the full reporting structure for this manager. The direct reports are the ones where level2=66666 and level3 is missing.
If you want to add the management level:
There is a variable "level" in the code which I've dropped. This variable contains the management level. Add it back in if you like. And yes: A manager is only found on a single level.
What will work for you depends of course what you intend to do with the data and also how your real data looks like. For example: Is it possible for an employee to have more than one entry in your source table (eg. being in two roles). If so then what I've posted would require amendments.
I have no experience with Tableau. Can you work with multiple tables or do you need everything in a single denormalized datamart. If you're going for some sort of OLAP then you would likely need to go for some sort of dimensional modelling first - whether you then create in the end a single datamart or not.
Hi Can you suggest what all modification needs to be done. if we have duplicate(that will form another hierarchy/tree) node. currently I am working on some similar project where I need to create a parent child grandchildren ....so on. relation table. Your code seems to be perfect but only thing is that I have duplicate also in the input file.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.