Re: Select first and last event

PDevi · Posted 02-04-2021 09:41 PM

Edited: Changed the required output data for id 2, which had the wrong start and end levels.

Hi all,

My company has a system of levels for probationary recruits that everyone has to pass through by completing certain tasks. They are usually given a period of 4 months to go through each level. If somebody has some relevant experience, they can start at a higher level. They are also allowed to change up the sequence in which they go through the levels, meaning they can are allowed to do a higher level task and then come back and finish the rest. If the person moves to another department, they repeat this.

The data looks as follows:

id	Month/Year	Start Month/Year	End Month/Year	Dept.	Level
1	Jan-17	Jan-17	Sep-17	Acct.	1
1	May-17	Jan-17	Sep-17	Acct.	2
1	Sep-17	Jan-17	Sep-17	Acct.	3
1	Jan-18	Jan-17	Sep-17	Acct.	4
2	Feb-19	Feb-19	Oct-19	Finance	2
2	Jun-19	Feb-19	Oct-19	Finance	3
2	Oct-19	Feb-19	Oct-19	Finance	4
3	Sep-18	Sep-18	Sep-19	HR	1
3	Jan-19	Sep-18	Sep-19	HR	2
3	May-19	Sep-18	Sep-19	HR	4
3	Sep-19	Sep-18	Sep-19	HR	3
1	Apr-20	Apr-20	Mar-21	SCM	1
1	Aug-20	Apr-20	Mar-21	SCM	2
1	Nov-20	Apr-20	Mar-21	SCM	3
1	Mar-21	Apr-20	Mar-21	SCM	4

I need to add two columns that give me the start and end levels each person took by dept. So, what I need is as below:

id	Month/Year	Start Month/Year	End Month/Year	Dept.	Level	Start level	End level
1	Jan-17	Jan-17	Sep-17	Acct.	1	1	4
1	May-17	Jan-17	Sep-17	Acct.	2	1	4
1	Sep-17	Jan-17	Sep-17	Acct.	3	1	4
1	Jan-18	Jan-17	Sep-17	Acct.	4	1	4
2	Feb-19	Feb-19	Oct-19	Finance	2	2	4
2	Jun-19	Feb-19	Oct-19	Finance	3	2	4
2	Oct-19	Feb-19	Oct-19	Finance	4	2	4
3	Sep-18	Sep-18	Sep-19	HR	1	1	3
3	Jan-19	Sep-18	Sep-19	HR	2	1	3
3	May-19	Sep-18	Sep-19	HR	4	1	3
3	Sep-19	Sep-18	Sep-19	HR	3	1	3
1	Apr-20	Apr-20	Mar-21	SCM	1	1	2
1	Aug-20	Apr-20	Mar-21	SCM	4	1	2
1	Nov-20	Apr-20	Mar-21	SCM	3	1	2
1	Mar-21	Apr-20	Mar-21	SCM	2	1	2

I would appreciate it if someone could please help me with this. Thank you in advance for all the help.

PeterClemmensen · Posted 02-05-2021 12:44 AM

Do something like this

data have;
input id $ MonthYear $ StartMonthYear $ EndMonthYear $ Dept $ Level;
datalines;
1 Jan-17 Jan-17 Sep-17 Acct.   1
1 May-17 Jan-17 Sep-17 Acct.   2
1 Sep-17 Jan-17 Sep-17 Acct.   3
1 Jan-18 Jan-17 Sep-17 Acct.   4
2 Feb-19 Feb-19 Oct-19 Finance 2
2 Jun-19 Feb-19 Oct-19 Finance 3
2 Oct-19 Feb-19 Oct-19 Finance 4
3 Sep-18 Sep-18 Sep-19 HR      1
3 Jan-19 Sep-18 Sep-19 HR      2
3 May-19 Sep-18 Sep-19 HR      4
3 Sep-19 Sep-18 Sep-19 HR      3
1 Apr-20 Apr-20 Mar-21 SCM     1
1 Aug-20 Apr-20 Mar-21 SCM     2
1 Nov-20 Apr-20 Mar-21 SCM     3
1 Mar-21 Apr-20 Mar-21 SCM     4
;

data want;
   do _N_ = 1 by 1 until (last.id);
      set have;
      by id notsorted;
      if _N_ = 1 then start = Level;
   end;

   end = Level;

   do _N_ = 1 to _N_;
      set have;
      output;
   end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

mkeintz · Posted 02-05-2021 12:45 AM

ID 2 has a sequence of 2, 3, 4. So why does it have start level 3 and end level 2?

Assuming that is an error, then here's a technique to pass through each id twice. The first pass determines the start and end levels. the second pass re-reads the id and outputs each obs with the recently determined start and end levels:

data want;
  set have (in=firstpass)  have (in=secondpass);
  by id;
  retain start_level end_level;
  if first.id then start_level=level;
  if in=firstpass then end_level=level;
  if secondpass;
run;

The by statement create the dummy variables first.id and last.id, providing a way to know when an ID is beginning or ending. So when FIRST.ID is true, then you have start_level.

Since the HAVE dataset is specified twice in the SET statement, the presence of the BY ID statement causes the data to be interleaved - i.e. each id is completely read twice before progressing to the next id.

The retain tells SAS not to reset the variables to missing.

And the "if in=firstpass" keeps update the end_level as long as your are on the first pass.

Finally, there is a subsetting if: "if secondpass;"

~~The firstpass dummy is tested~~

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Select first and last event

Re: Select first and last event

Re: Select first and last event

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away