BookmarkSubscribeRSS Feed
PDevi
Fluorite | Level 6

Edited: Changed the required output data for id 2, which had the wrong start and end levels.

 

Hi all,

 

My company has a system of levels for probationary recruits that everyone has to pass through by completing certain tasks. They are usually given a period of 4 months to go through each level. If somebody has some relevant experience, they can start at a higher level. They are also allowed to change up the sequence in which they go through the levels, meaning they can are allowed to do a higher level task and then come back and finish the rest. If the person moves to another department, they repeat this.

 

The data looks as follows:

id      Month/Year  Start Month/Year  End Month/Year  Dept.    Level
1 Jan-17 Jan-17 Sep-17 Acct. 1
1 May-17 Jan-17 Sep-17 Acct. 2
1 Sep-17 Jan-17 Sep-17 Acct. 3
1 Jan-18 Jan-17 Sep-17 Acct. 4
2 Feb-19 Feb-19 Oct-19 Finance 2
2 Jun-19 Feb-19 Oct-19 Finance 3
2 Oct-19 Feb-19 Oct-19 Finance 4
3 Sep-18 Sep-18 Sep-19 HR 1
3 Jan-19 Sep-18 Sep-19 HR 2
3 May-19 Sep-18 Sep-19 HR 4
3 Sep-19 Sep-18 Sep-19 HR 3
1 Apr-20 Apr-20 Mar-21 SCM 1
1 Aug-20 Apr-20 Mar-21 SCM 2
1 Nov-20 Apr-20 Mar-21 SCM 3
1 Mar-21 Apr-20 Mar-21 SCM 4

 

I need to add two columns that give me the start and end levels each person took by dept. So, what I need is as below:

id      Month/Year  Start Month/Year  End Month/Year  Dept.    Level Start level  End level
1 Jan-17 Jan-17 Sep-17 Acct. 1 1 4
1 May-17 Jan-17 Sep-17 Acct. 2 1 4
1 Sep-17 Jan-17 Sep-17 Acct. 3 1 4
1 Jan-18 Jan-17 Sep-17 Acct. 4 1 4
2 Feb-19 Feb-19 Oct-19 Finance 2 2 4
2 Jun-19 Feb-19 Oct-19 Finance 3 2 4
2 Oct-19 Feb-19 Oct-19 Finance 4 2 4
3 Sep-18 Sep-18 Sep-19 HR 1 1 3
3 Jan-19 Sep-18 Sep-19 HR 2 1 3
3 May-19 Sep-18 Sep-19 HR 4 1 3
3 Sep-19 Sep-18 Sep-19 HR 3 1 3
1 Apr-20 Apr-20 Mar-21 SCM 1 1 2
1 Aug-20 Apr-20 Mar-21 SCM 4 1 2
1 Nov-20 Apr-20 Mar-21 SCM 3 1 2
1 Mar-21 Apr-20 Mar-21 SCM 2 1 2

 

I would appreciate it if someone could please help me with this. Thank you in advance for all the help.

2 REPLIES 2
PeterClemmensen
Tourmaline | Level 20

Do something like this

 

data have;
input id $ MonthYear $ StartMonthYear $ EndMonthYear $ Dept $ Level;
datalines;
1 Jan-17 Jan-17 Sep-17 Acct.   1
1 May-17 Jan-17 Sep-17 Acct.   2
1 Sep-17 Jan-17 Sep-17 Acct.   3
1 Jan-18 Jan-17 Sep-17 Acct.   4
2 Feb-19 Feb-19 Oct-19 Finance 2
2 Jun-19 Feb-19 Oct-19 Finance 3
2 Oct-19 Feb-19 Oct-19 Finance 4
3 Sep-18 Sep-18 Sep-19 HR      1
3 Jan-19 Sep-18 Sep-19 HR      2
3 May-19 Sep-18 Sep-19 HR      4
3 Sep-19 Sep-18 Sep-19 HR      3
1 Apr-20 Apr-20 Mar-21 SCM     1
1 Aug-20 Apr-20 Mar-21 SCM     2
1 Nov-20 Apr-20 Mar-21 SCM     3
1 Mar-21 Apr-20 Mar-21 SCM     4
;

data want;
   do _N_ = 1 by 1 until (last.id);
      set have;
      by id notsorted;
      if _N_ = 1 then start = Level;
   end;

   end = Level;

   do _N_ = 1 to _N_;
      set have;
      output;
   end;
run;
mkeintz
PROC Star

ID 2 has a sequence of 2, 3, 4.  So why does it have start level 3 and end level 2?

 

Assuming that is an error, then here's a technique to pass through each id twice.  The first pass determines the start and end levels.  the second pass re-reads the id and outputs each obs with the recently determined start and end levels:

 

data want;
  set have (in=firstpass)  have (in=secondpass);
  by id;
  retain start_level end_level;
  if first.id then start_level=level;
  if in=firstpass then end_level=level;
  if secondpass;
run;

 

The by statement create the dummy variables first.id and last.id, providing a way to know when an ID is beginning or ending.  So when FIRST.ID is true, then you have start_level.

 

Since the HAVE dataset is specified twice in the SET statement, the presence of the BY ID statement causes the data to be interleaved - i.e. each id is completely read twice  before progressing to the next id.

 

The retain tells SAS not to reset the variables to missing.

 

And the "if in=firstpass" keeps update the end_level as long as your are on the first pass.

 

Finally, there is a subsetting if:  "if secondpass;"

 

The firstpass dummy is tested

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1101 views
  • 1 like
  • 3 in conversation