BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Atester
Calcite | Level 5

To the clever SAS users of the world, I need your help. The solution might come to me on Monday but nothing was coming to mind by the end of a long week.

 

My problem: I have HR data with records showing an employees’ duration in a given position.  Typically a person is assigned the role for a long or indefinite period, but when the person is away, another employee might be assigned the position for a short period.

 

HAVE:         WANT:      
start_date end_date position employee start_date end_date position employee
1-Sep-14 31-Dec-99 348 574   1-Sep-14 20-Mar-16 348 574
21-Mar-16 6-Jun-16 348 635   21-Mar-16 6-Jun-16 348 635
          7-Jun-16 31-Dec-99 348 574
                 
1-Aug-13 31-Dec-99 554 963   1-Aug-13 20-Jul-14 554 963
21-Jul-14 8-Jul-16 554 580   21-Jul-14 1-Nov-15 554 580
2-Nov-15 27-Nov-15 554 78   2-Nov-15 27-Nov-15 554 78
          28-Nov-15 8-Jul-16 554 580
          9-Jul-16 31-Dec-99 554 963
                 
3-Sep-09 4-Apr-15 742 830   3-Sep-09 4-Apr-15 742 830
21-May-15 24-Jun-15 742 38   21-May-15 10-Jun-15 742 38
11-Jun-15 31-Dec-99 742 313   11-Jun-15 31-Dec-99 742 313
                 
23-Nov-15 11-Dec-15 751 44   23-Nov-15 11-Dec-15 751 44
12-Dec-15 31-Dec-99 751 125   12-Dec-15 7-Apr-16 751 125
8-Apr-16 22-Apr-16 751 44   8-Apr-16 22-Apr-16 751 44
          23-Apr-16 31-Dec-99 751 125
                 
10-Mar-16 23-Mar-16 231 67   10-Mar-16 23-Mar-16 231 67
24-Mar-16 6-Apr-16 231 723   24-Mar-16 6-Apr-16 231 723
7-Apr-16 31-Dec-99 231 32   7-Apr-16 20-Apr-16 231 839
7-Apr-16 20-Apr-16 231 839   21-Apr-16 31-Dec-99 231 32

 

I want to convert these overlapping records into a single time series so it’s clear who was/is in a position on a given date.

 

The situations can become complicated.  And ambiguous.  The example data illustrates the appropriate assumptions.   

 

I didn't think it would be that difficult but I soon realised first. last. processing probably won't do.  Using the LAG function looked impractical, and dangerous, when mixed with too many IF-THEN-ELSE statements.  Look-ahead processing with extra SET statements might be the way.  And maybe a little array processing could be handy.  Perhaps Proc FCMP and a recursive routine.  I can even imagine a SQL solution - from an SQL guru.

 

Is anyone up to the challange?  I'm taking a break.

 

TIA

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
That is really not easy. There are too many scenarios you need to consider about.
I listed all the scenarios, you need to pick up which one you should keep.
I love this question.
NOTE: I convert 31-Dec-99 into today() , you can convert it back if you need to.




data have;
input start_date : date9. end : $20.   position    employee ;
if upcase(end)='31-DEC-99' then end_date=today();
 else end_date=input(end,date9.);
drop end;
format start_date  end_date date9.;
cards;
1-Sep-14    31-Dec-99   348 574                
21-Mar-16   6-Jun-16    348 635                
1-Aug-13    31-Dec-99   554 963                
21-Jul-14   8-Jul-16    554 580                
2-Nov-15    27-Nov-15   554 78                 
3-Sep-09    4-Apr-15    742 830                
21-May-15   24-Jun-15   742 38                 
11-Jun-15   31-Dec-99   742 313                
23-Nov-15   11-Dec-15   751 44                 
12-Dec-15   31-Dec-99   751 125                
8-Apr-16    22-Apr-16   751 44                 
10-Mar-16   23-Mar-16   231 67                 
24-Mar-16   6-Apr-16    231 723                
7-Apr-16    31-Dec-99   231 32                 
7-Apr-16    20-Apr-16   231 839
;
run;
data temp;
 set have;
 do date=start_date to end_date;
  output;
 end;
 keep date position    employee;
 format date date9.;
run;
proc freq data=have noprint;
 tables employee/out=employee(keep=employee);
run;
data _null_;
 set employee end=last;
 if _n_=1 then call execute('data all;merge ');
 call execute(catt('temp(where=(emp_',employee,'=',employee,
 ') rename=(employee=emp_',employee,'))'));
 if last then call execute(';by position date;run;');
run;

proc transpose data=all(keep=emp_: obs=0) out=vname;
run;
options missing=' ';
data _null_;
 set vname end=last;
 if _n_=1 then 
 call execute('data all;length employees $ 20000;retain employees;set all;if '); 
 call execute(catt(_name_,' ne lag(',_name_,')'));
 if last then 
 call execute(' then do;group+1;employees=catx("|",of emp_:);end;run;');
  else call execute(' or ');
run;
data want;
 set all;
 by group;
 retain start_date;
 if first.group then start_date=date;
 if last.group then do;
  end_date=date;output;
 end;
 format start_date end_date date9.;
 drop emp_: date;
run;

proc print data=want noobs;
run;
  

View solution in original post

6 REPLIES 6
Reeza
Super User

Treat it as medicine cabinet problem. 

 

Take each record and create a record for every day. 

Later employees take priority, when duplicated. 

 

Then use first/last to create your want dataset. 

Shmuel
Garnet | Level 18

A remark to @Reeza's answer -  you'll need to sort derived data 

by:  position start_date  before selecting the last.start_date

 

Ksharp
Super User
That is really not easy. There are too many scenarios you need to consider about.
I listed all the scenarios, you need to pick up which one you should keep.
I love this question.
NOTE: I convert 31-Dec-99 into today() , you can convert it back if you need to.




data have;
input start_date : date9. end : $20.   position    employee ;
if upcase(end)='31-DEC-99' then end_date=today();
 else end_date=input(end,date9.);
drop end;
format start_date  end_date date9.;
cards;
1-Sep-14    31-Dec-99   348 574                
21-Mar-16   6-Jun-16    348 635                
1-Aug-13    31-Dec-99   554 963                
21-Jul-14   8-Jul-16    554 580                
2-Nov-15    27-Nov-15   554 78                 
3-Sep-09    4-Apr-15    742 830                
21-May-15   24-Jun-15   742 38                 
11-Jun-15   31-Dec-99   742 313                
23-Nov-15   11-Dec-15   751 44                 
12-Dec-15   31-Dec-99   751 125                
8-Apr-16    22-Apr-16   751 44                 
10-Mar-16   23-Mar-16   231 67                 
24-Mar-16   6-Apr-16    231 723                
7-Apr-16    31-Dec-99   231 32                 
7-Apr-16    20-Apr-16   231 839
;
run;
data temp;
 set have;
 do date=start_date to end_date;
  output;
 end;
 keep date position    employee;
 format date date9.;
run;
proc freq data=have noprint;
 tables employee/out=employee(keep=employee);
run;
data _null_;
 set employee end=last;
 if _n_=1 then call execute('data all;merge ');
 call execute(catt('temp(where=(emp_',employee,'=',employee,
 ') rename=(employee=emp_',employee,'))'));
 if last then call execute(';by position date;run;');
run;

proc transpose data=all(keep=emp_: obs=0) out=vname;
run;
options missing=' ';
data _null_;
 set vname end=last;
 if _n_=1 then 
 call execute('data all;length employees $ 20000;retain employees;set all;if '); 
 call execute(catt(_name_,' ne lag(',_name_,')'));
 if last then 
 call execute(' then do;group+1;employees=catx("|",of emp_:);end;run;');
  else call execute(' or ');
run;
data want;
 set all;
 by group;
 retain start_date;
 if first.group then start_date=date;
 if last.group then do;
  end_date=date;output;
 end;
 format start_date end_date date9.;
 drop emp_: date;
run;

proc print data=want noobs;
run;
  

Atester
Calcite | Level 5

Thanks  @Reeza@Shmuel and @Ksharp.

 

It should have been obvious (to me) that the obvious solution just needed to break down the data to individual days.

 

@Ksharp I appreciate the effort you put into your solution, so you have to get the Accepted Solution. (Glad you enjoyed the challange Smiley Happy )

 

I figured though, that I could assume the latest start_date takes precedence so could manage with a slightly simpler solution.

 

Thanks again everyone.

 

data have;
input start_date : date9. end : $20.   position    employee ;
if upcase(end)='31-DEC-99' then end_date=today();
else end_date=input(end,date9.);
drop end;
format start_date  end_date date9.;
cards;
1-Sep-15    31-Dec-99   348 574               
21-Mar-16   6-Jun-16    348 635               
1-Aug-13    31-Dec-99   554 963               
21-Jul-14   8-Jul-16    554 580               
2-Nov-15    27-Nov-15   554 78                
3-Sep-14    4-Apr-15    742 830               
9-Apr-15    28-Apr-15   742 830               
21-May-15   24-Jun-15   742 38                
11-Jun-15   31-Dec-99   742 313               
23-Nov-15   11-Dec-15   751 44                
12-Dec-15   31-Dec-99   751 125               
8-Apr-16    22-Apr-16   751 44                
10-Mar-16   23-Mar-16   231 67                
24-Mar-16   6-Apr-16    231 723               
7-Apr-16    31-Dec-99   231 32                 
7-Apr-16    20-Apr-16   231 839
;
run;
data temp;
set have;
do date=start_date to end_date;
  output;
end;
keep  position  date  employee start_date end_date;
format date date9.;
run;
proc sort ;
   by position date descending start_date end_date;
run; 

data temp2;
   set temp;
   by position date descending start_date ;
   if first.date;
run;
 
data want;
  keep position start_dt end_dt emp;
  set temp2;
  by position  ;
  retain position emp last_emp last_date start_dt end_dt;
  format start_dt end_dt date9.;
  if first.position then
    do;
     last_emp = employee;
     start_dt= date;
     last_date = date;
     if last.position then
       do;
        emp = employee;
        end_dt = date;
        output;
       end;
     else
       return; 
     end;
 
  if employee = last_emp  and  date = last_date+1 then
     last_date = date;
  else
    link output_rec;
 
  if last.position then
     link output_rec;
return;
 
output_rec:
     emp = last_emp;
     end_dt = last_date;
     output;
     last_emp = employee;
     last_date = date;
     start_dt = date;
return;
 
run;
 

 

  

Shmuel
Garnet | Level 18

Just to display an alternative checked code, maybe easier to understand.

On my platform 31_dec_99 it is interpreted as 31DEC1999 therefore today() was prefered .

For my covenient i used ddmmyy10. format instead date9. - you may change it.

 

data have;
input start_date : date9. end_date : date9.
position employee ;
if end_date = '31DEC99'd then end_date=today(); 
format start_date end_date date9.;
cards;
1-Sep-14 31-Dec-99 348 574
21-Mar-16 6-Jun-16 348 635
1-Aug-13 31-Dec-99 554 963
21-Jul-14 8-Jul-16 554 580
2-Nov-15 27-Nov-15 554 78
3-Sep-09 4-Apr-15 742 830
21-May-15 24-Jun-15 742 38
11-Jun-15 31-Dec-99 742 313
23-Nov-15 11-Dec-15 751 44
12-Dec-15 31-Dec-99 751 125
8-Apr-16 22-Apr-16 751 44
10-Mar-16 23-Mar-16 231 67
24-Mar-16 6-Apr-16 231 723
7-Apr-16 31-Dec-99 231 32
7-Apr-16 20-Apr-16 231 839
;
run;


proc sort data=have out=temp1;
  by position start_date employee;
run;

 

data temp2(keep=position date priority employee);
 set temp1;
  by position start_date employee;
       retain priority;
      if first.position then priority = 0;
      if first.employee then priority+1;
      do date=start_date to end_date; output; end;
      format date ddmmyy10.;
run;


proc sort data=temp2 out=temp3;
  by position date priority ;
run;


data temp4(drop=priority);
  set temp3;
   by position notsorted date;
       if last.date;
run;

 

data want(drop=date);
  set temp4;
   by position notsorted employee;
        retain start_date;
        format start_date end_date ddmmyy10.;
        if first.employee then start_date = date;
        if last.employee then do;
           end_date = date;
           output;
       end;
run;

 

Sory for being not free to develop it earlier.

Anyhow it was a chalenge.

Atester
Calcite | Level 5

Thanks - it's certainly neater. 

 

Slight amendment needed though, to handle when an employee has separate spells in a position, without someone else in between.

 

I've made small amendments accordingly, highlighted below.

 

data have;
input start_date : date9. end_date : date9.
position employee ;
if end_date = '31DEC99'd then end_date=today(); 
format start_date end_date date9.;
cards;
1-Sep-14 31-Dec-99 348 574
21-Mar-16 6-Jun-16 348 635
1-Aug-13 31-Dec-99 554 963
21-Jul-14 8-Jul-16 554 580
2-Nov-15 27-Nov-15 554 78
3-Sep-09 4-Apr-15 742 830
21-May-15 24-Jun-15 742 830
11-Jun-15 31-Dec-99 742 313
23-Nov-15 11-Dec-15 751 44
12-Dec-15 31-Dec-99 751 125
8-Apr-16 22-Apr-16 751 44
10-Mar-16 23-Mar-16 231 67
24-Mar-16 6-Apr-16 231 723
7-Apr-16 31-Dec-99 231 32
7-Apr-16 20-Apr-16 231 839
;
run;


proc sort data=have out=temp1;
  by position start_date employee;
run;

 

data temp2(keep=position date priority employee);
 set temp1;
  by position start_date employee;
       retain priority;
      if first.position then priority = 0;
      if first.employee then priority+1;
      do date=start_date to end_date; output; end;
      format date ddmmyy10.;
run;


proc sort data=temp2 out=temp3;
  by position date priority ;
run;


data temp4 /* (drop=priority) */;
  set temp3;
   by position notsorted date;
   if last.date;
run;

 

data want(drop=date priority);
  set temp4;
   by position notsorted employee priority;
        retain start_date;
        format start_date end_date ddmmyy10.;
        if first.priority then start_date = date;
        if last.priority then do;
           end_date = date;
           output;
       end;
run;

 

 

Thanks again. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1105 views
  • 2 likes
  • 4 in conversation