BookmarkSubscribeRSS Feed
PDevi
Fluorite | Level 6

Edited: Changed the required output data for id 2, which had the wrong start and end levels.

 

Hi all,

 

My company has a system of levels for probationary recruits that everyone has to pass through by completing certain tasks. They are usually given a period of 4 months to go through each level. If somebody has some relevant experience, they can start at a higher level. They are also allowed to change up the sequence in which they go through the levels, meaning they can are allowed to do a higher level task and then come back and finish the rest. If the person moves to another department, they repeat this.

 

The data looks as follows:

id      Month/Year  Start Month/Year  End Month/Year  Dept.    Level
1 Jan-17 Jan-17 Sep-17 Acct. 1
1 May-17 Jan-17 Sep-17 Acct. 2
1 Sep-17 Jan-17 Sep-17 Acct. 3
1 Jan-18 Jan-17 Sep-17 Acct. 4
2 Feb-19 Feb-19 Oct-19 Finance 2
2 Jun-19 Feb-19 Oct-19 Finance 3
2 Oct-19 Feb-19 Oct-19 Finance 4
3 Sep-18 Sep-18 Sep-19 HR 1
3 Jan-19 Sep-18 Sep-19 HR 2
3 May-19 Sep-18 Sep-19 HR 4
3 Sep-19 Sep-18 Sep-19 HR 3
1 Apr-20 Apr-20 Mar-21 SCM 1
1 Aug-20 Apr-20 Mar-21 SCM 2
1 Nov-20 Apr-20 Mar-21 SCM 3
1 Mar-21 Apr-20 Mar-21 SCM 4

 

I need to add two columns that give me the start and end levels each person took by dept. So, what I need is as below:

id      Month/Year  Start Month/Year  End Month/Year  Dept.    Level Start level  End level
1 Jan-17 Jan-17 Sep-17 Acct. 1 1 4
1 May-17 Jan-17 Sep-17 Acct. 2 1 4
1 Sep-17 Jan-17 Sep-17 Acct. 3 1 4
1 Jan-18 Jan-17 Sep-17 Acct. 4 1 4
2 Feb-19 Feb-19 Oct-19 Finance 2 2 4
2 Jun-19 Feb-19 Oct-19 Finance 3 2 4
2 Oct-19 Feb-19 Oct-19 Finance 4 2 4
3 Sep-18 Sep-18 Sep-19 HR 1 1 3
3 Jan-19 Sep-18 Sep-19 HR 2 1 3
3 May-19 Sep-18 Sep-19 HR 4 1 3
3 Sep-19 Sep-18 Sep-19 HR 3 1 3
1 Apr-20 Apr-20 Mar-21 SCM 1 1 2
1 Aug-20 Apr-20 Mar-21 SCM 4 1 2
1 Nov-20 Apr-20 Mar-21 SCM 3 1 2
1 Mar-21 Apr-20 Mar-21 SCM 2 1 2

 

I would appreciate it if someone could please help me with this. Thank you in advance for all the help.

2 REPLIES 2
PeterClemmensen
Tourmaline | Level 20

Do something like this

 

data have;
input id $ MonthYear $ StartMonthYear $ EndMonthYear $ Dept $ Level;
datalines;
1 Jan-17 Jan-17 Sep-17 Acct.   1
1 May-17 Jan-17 Sep-17 Acct.   2
1 Sep-17 Jan-17 Sep-17 Acct.   3
1 Jan-18 Jan-17 Sep-17 Acct.   4
2 Feb-19 Feb-19 Oct-19 Finance 2
2 Jun-19 Feb-19 Oct-19 Finance 3
2 Oct-19 Feb-19 Oct-19 Finance 4
3 Sep-18 Sep-18 Sep-19 HR      1
3 Jan-19 Sep-18 Sep-19 HR      2
3 May-19 Sep-18 Sep-19 HR      4
3 Sep-19 Sep-18 Sep-19 HR      3
1 Apr-20 Apr-20 Mar-21 SCM     1
1 Aug-20 Apr-20 Mar-21 SCM     2
1 Nov-20 Apr-20 Mar-21 SCM     3
1 Mar-21 Apr-20 Mar-21 SCM     4
;

data want;
   do _N_ = 1 by 1 until (last.id);
      set have;
      by id notsorted;
      if _N_ = 1 then start = Level;
   end;

   end = Level;

   do _N_ = 1 to _N_;
      set have;
      output;
   end;
run;
mkeintz
PROC Star

ID 2 has a sequence of 2, 3, 4.  So why does it have start level 3 and end level 2?

 

Assuming that is an error, then here's a technique to pass through each id twice.  The first pass determines the start and end levels.  the second pass re-reads the id and outputs each obs with the recently determined start and end levels:

 

data want;
  set have (in=firstpass)  have (in=secondpass);
  by id;
  retain start_level end_level;
  if first.id then start_level=level;
  if in=firstpass then end_level=level;
  if secondpass;
run;

 

The by statement create the dummy variables first.id and last.id, providing a way to know when an ID is beginning or ending.  So when FIRST.ID is true, then you have start_level.

 

Since the HAVE dataset is specified twice in the SET statement, the presence of the BY ID statement causes the data to be interleaved - i.e. each id is completely read twice  before progressing to the next id.

 

The retain tells SAS not to reset the variables to missing.

 

And the "if in=firstpass" keeps update the end_level as long as your are on the first pass.

 

Finally, there is a subsetting if:  "if secondpass;"

 

The firstpass dummy is tested

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 522 views
  • 1 like
  • 3 in conversation