DATA Step, Macro, Functions and more

Not so simple time-line processing

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Not so simple time-line processing

[ Edited ]

To the clever SAS users of the world, I need your help. The solution might come to me on Monday but nothing was coming to mind by the end of a long week.

 

My problem: I have HR data with records showing an employees’ duration in a given position.  Typically a person is assigned the role for a long or indefinite period, but when the person is away, another employee might be assigned the position for a short period.

 

HAVE:         WANT:      
start_date end_date position employee start_date end_date position employee
1-Sep-14 31-Dec-99 348 574   1-Sep-14 20-Mar-16 348 574
21-Mar-16 6-Jun-16 348 635   21-Mar-16 6-Jun-16 348 635
          7-Jun-16 31-Dec-99 348 574
                 
1-Aug-13 31-Dec-99 554 963   1-Aug-13 20-Jul-14 554 963
21-Jul-14 8-Jul-16 554 580   21-Jul-14 1-Nov-15 554 580
2-Nov-15 27-Nov-15 554 78   2-Nov-15 27-Nov-15 554 78
          28-Nov-15 8-Jul-16 554 580
          9-Jul-16 31-Dec-99 554 963
                 
3-Sep-09 4-Apr-15 742 830   3-Sep-09 4-Apr-15 742 830
21-May-15 24-Jun-15 742 38   21-May-15 10-Jun-15 742 38
11-Jun-15 31-Dec-99 742 313   11-Jun-15 31-Dec-99 742 313
                 
23-Nov-15 11-Dec-15 751 44   23-Nov-15 11-Dec-15 751 44
12-Dec-15 31-Dec-99 751 125   12-Dec-15 7-Apr-16 751 125
8-Apr-16 22-Apr-16 751 44   8-Apr-16 22-Apr-16 751 44
          23-Apr-16 31-Dec-99 751 125
                 
10-Mar-16 23-Mar-16 231 67   10-Mar-16 23-Mar-16 231 67
24-Mar-16 6-Apr-16 231 723   24-Mar-16 6-Apr-16 231 723
7-Apr-16 31-Dec-99 231 32   7-Apr-16 20-Apr-16 231 839
7-Apr-16 20-Apr-16 231 839   21-Apr-16 31-Dec-99 231 32

 

I want to convert these overlapping records into a single time series so it’s clear who was/is in a position on a given date.

 

The situations can become complicated.  And ambiguous.  The example data illustrates the appropriate assumptions.   

 

I didn't think it would be that difficult but I soon realised first. last. processing probably won't do.  Using the LAG function looked impractical, and dangerous, when mixed with too many IF-THEN-ELSE statements.  Look-ahead processing with extra SET statements might be the way.  And maybe a little array processing could be handy.  Perhaps Proc FCMP and a recursive routine.  I can even imagine a SQL solution - from an SQL guru.

 

Is anyone up to the challange?  I'm taking a break.

 

TIA

 


Accepted Solutions
Solution
‎10-31-2016 08:14 AM
Super User
Posts: 10,041

Re: Not so simple time-line processing

That is really not easy. There are too many scenarios you need to consider about.
I listed all the scenarios, you need to pick up which one you should keep.
I love this question.
NOTE: I convert 31-Dec-99 into today() , you can convert it back if you need to.




data have;
input start_date : date9. end : $20.   position    employee ;
if upcase(end)='31-DEC-99' then end_date=today();
 else end_date=input(end,date9.);
drop end;
format start_date  end_date date9.;
cards;
1-Sep-14    31-Dec-99   348 574                
21-Mar-16   6-Jun-16    348 635                
1-Aug-13    31-Dec-99   554 963                
21-Jul-14   8-Jul-16    554 580                
2-Nov-15    27-Nov-15   554 78                 
3-Sep-09    4-Apr-15    742 830                
21-May-15   24-Jun-15   742 38                 
11-Jun-15   31-Dec-99   742 313                
23-Nov-15   11-Dec-15   751 44                 
12-Dec-15   31-Dec-99   751 125                
8-Apr-16    22-Apr-16   751 44                 
10-Mar-16   23-Mar-16   231 67                 
24-Mar-16   6-Apr-16    231 723                
7-Apr-16    31-Dec-99   231 32                 
7-Apr-16    20-Apr-16   231 839
;
run;
data temp;
 set have;
 do date=start_date to end_date;
  output;
 end;
 keep date position    employee;
 format date date9.;
run;
proc freq data=have noprint;
 tables employee/out=employee(keep=employee);
run;
data _null_;
 set employee end=last;
 if _n_=1 then call execute('data all;merge ');
 call execute(catt('temp(where=(emp_',employee,'=',employee,
 ') rename=(employee=emp_',employee,'))'));
 if last then call execute(';by position date;run;');
run;

proc transpose data=all(keep=emp_: obs=0) out=vname;
run;
options missing=' ';
data _null_;
 set vname end=last;
 if _n_=1 then 
 call execute('data all;length employees $ 20000;retain employees;set all;if '); 
 call execute(catt(_name_,' ne lag(',_name_,')'));
 if last then 
 call execute(' then do;group+1;employees=catx("|",of emp_:);end;run;');
  else call execute(' or ');
run;
data want;
 set all;
 by group;
 retain start_date;
 if first.group then start_date=date;
 if last.group then do;
  end_date=date;output;
 end;
 format start_date end_date date9.;
 drop emp_: date;
run;

proc print data=want noobs;
run;
  

View solution in original post


All Replies
Super User
Posts: 19,850

Re: Not so simple time-line processing

Treat it as medicine cabinet problem. 

 

Take each record and create a record for every day. 

Later employees take priority, when duplicated. 

 

Then use first/last to create your want dataset. 

Trusted Advisor
Posts: 1,579

Re: Not so simple time-line processing

A remark to @Reeza's answer -  you'll need to sort derived data 

by:  position start_date  before selecting the last.start_date

 

Solution
‎10-31-2016 08:14 AM
Super User
Posts: 10,041

Re: Not so simple time-line processing

That is really not easy. There are too many scenarios you need to consider about.
I listed all the scenarios, you need to pick up which one you should keep.
I love this question.
NOTE: I convert 31-Dec-99 into today() , you can convert it back if you need to.




data have;
input start_date : date9. end : $20.   position    employee ;
if upcase(end)='31-DEC-99' then end_date=today();
 else end_date=input(end,date9.);
drop end;
format start_date  end_date date9.;
cards;
1-Sep-14    31-Dec-99   348 574                
21-Mar-16   6-Jun-16    348 635                
1-Aug-13    31-Dec-99   554 963                
21-Jul-14   8-Jul-16    554 580                
2-Nov-15    27-Nov-15   554 78                 
3-Sep-09    4-Apr-15    742 830                
21-May-15   24-Jun-15   742 38                 
11-Jun-15   31-Dec-99   742 313                
23-Nov-15   11-Dec-15   751 44                 
12-Dec-15   31-Dec-99   751 125                
8-Apr-16    22-Apr-16   751 44                 
10-Mar-16   23-Mar-16   231 67                 
24-Mar-16   6-Apr-16    231 723                
7-Apr-16    31-Dec-99   231 32                 
7-Apr-16    20-Apr-16   231 839
;
run;
data temp;
 set have;
 do date=start_date to end_date;
  output;
 end;
 keep date position    employee;
 format date date9.;
run;
proc freq data=have noprint;
 tables employee/out=employee(keep=employee);
run;
data _null_;
 set employee end=last;
 if _n_=1 then call execute('data all;merge ');
 call execute(catt('temp(where=(emp_',employee,'=',employee,
 ') rename=(employee=emp_',employee,'))'));
 if last then call execute(';by position date;run;');
run;

proc transpose data=all(keep=emp_: obs=0) out=vname;
run;
options missing=' ';
data _null_;
 set vname end=last;
 if _n_=1 then 
 call execute('data all;length employees $ 20000;retain employees;set all;if '); 
 call execute(catt(_name_,' ne lag(',_name_,')'));
 if last then 
 call execute(' then do;group+1;employees=catx("|",of emp_:);end;run;');
  else call execute(' or ');
run;
data want;
 set all;
 by group;
 retain start_date;
 if first.group then start_date=date;
 if last.group then do;
  end_date=date;output;
 end;
 format start_date end_date date9.;
 drop emp_: date;
run;

proc print data=want noobs;
run;
  

Occasional Contributor
Posts: 5

Re: Not so simple time-line processing

Thanks  @Reeza@Shmuel and @Ksharp.

 

It should have been obvious (to me) that the obvious solution just needed to break down the data to individual days.

 

@Ksharp I appreciate the effort you put into your solution, so you have to get the Accepted Solution. (Glad you enjoyed the challange Smiley Happy )

 

I figured though, that I could assume the latest start_date takes precedence so could manage with a slightly simpler solution.

 

Thanks again everyone.

 

data have;
input start_date : date9. end : $20.   position    employee ;
if upcase(end)='31-DEC-99' then end_date=today();
else end_date=input(end,date9.);
drop end;
format start_date  end_date date9.;
cards;
1-Sep-15    31-Dec-99   348 574               
21-Mar-16   6-Jun-16    348 635               
1-Aug-13    31-Dec-99   554 963               
21-Jul-14   8-Jul-16    554 580               
2-Nov-15    27-Nov-15   554 78                
3-Sep-14    4-Apr-15    742 830               
9-Apr-15    28-Apr-15   742 830               
21-May-15   24-Jun-15   742 38                
11-Jun-15   31-Dec-99   742 313               
23-Nov-15   11-Dec-15   751 44                
12-Dec-15   31-Dec-99   751 125               
8-Apr-16    22-Apr-16   751 44                
10-Mar-16   23-Mar-16   231 67                
24-Mar-16   6-Apr-16    231 723               
7-Apr-16    31-Dec-99   231 32                 
7-Apr-16    20-Apr-16   231 839
;
run;
data temp;
set have;
do date=start_date to end_date;
  output;
end;
keep  position  date  employee start_date end_date;
format date date9.;
run;
proc sort ;
   by position date descending start_date end_date;
run; 

data temp2;
   set temp;
   by position date descending start_date ;
   if first.date;
run;
 
data want;
  keep position start_dt end_dt emp;
  set temp2;
  by position  ;
  retain position emp last_emp last_date start_dt end_dt;
  format start_dt end_dt date9.;
  if first.position then
    do;
     last_emp = employee;
     start_dt= date;
     last_date = date;
     if last.position then
       do;
        emp = employee;
        end_dt = date;
        output;
       end;
     else
       return; 
     end;
 
  if employee = last_emp  and  date = last_date+1 then
     last_date = date;
  else
    link output_rec;
 
  if last.position then
     link output_rec;
return;
 
output_rec:
     emp = last_emp;
     end_dt = last_date;
     output;
     last_emp = employee;
     last_date = date;
     start_dt = date;
return;
 
run;
 

 

  

Trusted Advisor
Posts: 1,579

Re: Not so simple time-line processing

Just to display an alternative checked code, maybe easier to understand.

On my platform 31_dec_99 it is interpreted as 31DEC1999 therefore today() was prefered .

For my covenient i used ddmmyy10. format instead date9. - you may change it.

 

data have;
input start_date : date9. end_date : date9.
position employee ;
if end_date = '31DEC99'd then end_date=today(); 
format start_date end_date date9.;
cards;
1-Sep-14 31-Dec-99 348 574
21-Mar-16 6-Jun-16 348 635
1-Aug-13 31-Dec-99 554 963
21-Jul-14 8-Jul-16 554 580
2-Nov-15 27-Nov-15 554 78
3-Sep-09 4-Apr-15 742 830
21-May-15 24-Jun-15 742 38
11-Jun-15 31-Dec-99 742 313
23-Nov-15 11-Dec-15 751 44
12-Dec-15 31-Dec-99 751 125
8-Apr-16 22-Apr-16 751 44
10-Mar-16 23-Mar-16 231 67
24-Mar-16 6-Apr-16 231 723
7-Apr-16 31-Dec-99 231 32
7-Apr-16 20-Apr-16 231 839
;
run;


proc sort data=have out=temp1;
  by position start_date employee;
run;

 

data temp2(keep=position date priority employee);
 set temp1;
  by position start_date employee;
       retain priority;
      if first.position then priority = 0;
      if first.employee then priority+1;
      do date=start_date to end_date; output; end;
      format date ddmmyy10.;
run;


proc sort data=temp2 out=temp3;
  by position date priority ;
run;


data temp4(drop=priority);
  set temp3;
   by position notsorted date;
       if last.date;
run;

 

data want(drop=date);
  set temp4;
   by position notsorted employee;
        retain start_date;
        format start_date end_date ddmmyy10.;
        if first.employee then start_date = date;
        if last.employee then do;
           end_date = date;
           output;
       end;
run;

 

Sory for being not free to develop it earlier.

Anyhow it was a chalenge.

Occasional Contributor
Posts: 5

Re: Not so simple time-line processing

Thanks - it's certainly neater. 

 

Slight amendment needed though, to handle when an employee has separate spells in a position, without someone else in between.

 

I've made small amendments accordingly, highlighted below.

 

data have;
input start_date : date9. end_date : date9.
position employee ;
if end_date = '31DEC99'd then end_date=today(); 
format start_date end_date date9.;
cards;
1-Sep-14 31-Dec-99 348 574
21-Mar-16 6-Jun-16 348 635
1-Aug-13 31-Dec-99 554 963
21-Jul-14 8-Jul-16 554 580
2-Nov-15 27-Nov-15 554 78
3-Sep-09 4-Apr-15 742 830
21-May-15 24-Jun-15 742 830
11-Jun-15 31-Dec-99 742 313
23-Nov-15 11-Dec-15 751 44
12-Dec-15 31-Dec-99 751 125
8-Apr-16 22-Apr-16 751 44
10-Mar-16 23-Mar-16 231 67
24-Mar-16 6-Apr-16 231 723
7-Apr-16 31-Dec-99 231 32
7-Apr-16 20-Apr-16 231 839
;
run;


proc sort data=have out=temp1;
  by position start_date employee;
run;

 

data temp2(keep=position date priority employee);
 set temp1;
  by position start_date employee;
       retain priority;
      if first.position then priority = 0;
      if first.employee then priority+1;
      do date=start_date to end_date; output; end;
      format date ddmmyy10.;
run;


proc sort data=temp2 out=temp3;
  by position date priority ;
run;


data temp4 /* (drop=priority) */;
  set temp3;
   by position notsorted date;
   if last.date;
run;

 

data want(drop=date priority);
  set temp4;
   by position notsorted employee priority;
        retain start_date;
        format start_date end_date ddmmyy10.;
        if first.priority then start_date = date;
        if last.priority then do;
           end_date = date;
           output;
       end;
run;

 

 

Thanks again. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 371 views
  • 2 likes
  • 4 in conversation