Edited: Changed the required output data for id 2, which had the wrong start and end levels.
Hi all,
My company has a system of levels for probationary recruits that everyone has to pass through by completing certain tasks. They are usually given a period of 4 months to go through each level. If somebody has some relevant experience, they can start at a higher level. They are also allowed to change up the sequence in which they go through the levels, meaning they can are allowed to do a higher level task and then come back and finish the rest. If the person moves to another department, they repeat this.
The data looks as follows:
id | Month/Year | Start Month/Year | End Month/Year | Dept. | Level |
1 | Jan-17 | Jan-17 | Sep-17 | Acct. | 1 |
1 | May-17 | Jan-17 | Sep-17 | Acct. | 2 |
1 | Sep-17 | Jan-17 | Sep-17 | Acct. | 3 |
1 | Jan-18 | Jan-17 | Sep-17 | Acct. | 4 |
2 | Feb-19 | Feb-19 | Oct-19 | Finance | 2 |
2 | Jun-19 | Feb-19 | Oct-19 | Finance | 3 |
2 | Oct-19 | Feb-19 | Oct-19 | Finance | 4 |
3 | Sep-18 | Sep-18 | Sep-19 | HR | 1 |
3 | Jan-19 | Sep-18 | Sep-19 | HR | 2 |
3 | May-19 | Sep-18 | Sep-19 | HR | 4 |
3 | Sep-19 | Sep-18 | Sep-19 | HR | 3 |
1 | Apr-20 | Apr-20 | Mar-21 | SCM | 1 |
1 | Aug-20 | Apr-20 | Mar-21 | SCM | 2 |
1 | Nov-20 | Apr-20 | Mar-21 | SCM | 3 |
1 | Mar-21 | Apr-20 | Mar-21 | SCM | 4 |
I need to add two columns that give me the start and end levels each person took by dept. So, what I need is as below:
id | Month/Year | Start Month/Year | End Month/Year | Dept. | Level | Start level | End level |
1 | Jan-17 | Jan-17 | Sep-17 | Acct. | 1 | 1 | 4 |
1 | May-17 | Jan-17 | Sep-17 | Acct. | 2 | 1 | 4 |
1 | Sep-17 | Jan-17 | Sep-17 | Acct. | 3 | 1 | 4 |
1 | Jan-18 | Jan-17 | Sep-17 | Acct. | 4 | 1 | 4 |
2 | Feb-19 | Feb-19 | Oct-19 | Finance | 2 | 2 | 4 |
2 | Jun-19 | Feb-19 | Oct-19 | Finance | 3 | 2 | 4 |
2 | Oct-19 | Feb-19 | Oct-19 | Finance | 4 | 2 | 4 |
3 | Sep-18 | Sep-18 | Sep-19 | HR | 1 | 1 | 3 |
3 | Jan-19 | Sep-18 | Sep-19 | HR | 2 | 1 | 3 |
3 | May-19 | Sep-18 | Sep-19 | HR | 4 | 1 | 3 |
3 | Sep-19 | Sep-18 | Sep-19 | HR | 3 | 1 | 3 |
1 | Apr-20 | Apr-20 | Mar-21 | SCM | 1 | 1 | 2 |
1 | Aug-20 | Apr-20 | Mar-21 | SCM | 4 | 1 | 2 |
1 | Nov-20 | Apr-20 | Mar-21 | SCM | 3 | 1 | 2 |
1 | Mar-21 | Apr-20 | Mar-21 | SCM | 2 | 1 | 2 |
I would appreciate it if someone could please help me with this. Thank you in advance for all the help.
Do something like this
data have;
input id $ MonthYear $ StartMonthYear $ EndMonthYear $ Dept $ Level;
datalines;
1 Jan-17 Jan-17 Sep-17 Acct. 1
1 May-17 Jan-17 Sep-17 Acct. 2
1 Sep-17 Jan-17 Sep-17 Acct. 3
1 Jan-18 Jan-17 Sep-17 Acct. 4
2 Feb-19 Feb-19 Oct-19 Finance 2
2 Jun-19 Feb-19 Oct-19 Finance 3
2 Oct-19 Feb-19 Oct-19 Finance 4
3 Sep-18 Sep-18 Sep-19 HR 1
3 Jan-19 Sep-18 Sep-19 HR 2
3 May-19 Sep-18 Sep-19 HR 4
3 Sep-19 Sep-18 Sep-19 HR 3
1 Apr-20 Apr-20 Mar-21 SCM 1
1 Aug-20 Apr-20 Mar-21 SCM 2
1 Nov-20 Apr-20 Mar-21 SCM 3
1 Mar-21 Apr-20 Mar-21 SCM 4
;
data want;
do _N_ = 1 by 1 until (last.id);
set have;
by id notsorted;
if _N_ = 1 then start = Level;
end;
end = Level;
do _N_ = 1 to _N_;
set have;
output;
end;
run;
ID 2 has a sequence of 2, 3, 4. So why does it have start level 3 and end level 2?
Assuming that is an error, then here's a technique to pass through each id twice. The first pass determines the start and end levels. the second pass re-reads the id and outputs each obs with the recently determined start and end levels:
data want;
set have (in=firstpass) have (in=secondpass);
by id;
retain start_level end_level;
if first.id then start_level=level;
if in=firstpass then end_level=level;
if secondpass;
run;
The by statement create the dummy variables first.id and last.id, providing a way to know when an ID is beginning or ending. So when FIRST.ID is true, then you have start_level.
Since the HAVE dataset is specified twice in the SET statement, the presence of the BY ID statement causes the data to be interleaved - i.e. each id is completely read twice before progressing to the next id.
The retain tells SAS not to reset the variables to missing.
And the "if in=firstpass" keeps update the end_level as long as your are on the first pass.
Finally, there is a subsetting if: "if secondpass;"
The firstpass dummy is tested
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.