I have a dataset containing the start date (date_initial) and the end date (date_final) that an employee has assumed a function (effective or temporary) in a company. I would like to reduce the number of rows, by type of function assumed (effective or temporary) in the company, since the start date of a most recently assumed function was obtained the day after the completion of another previous function.
For example, the employee Douglas assumed the oldest effective function from 03/15/2013 until 04/19/2016. Subsequently, he uninterruptedly assumed other functions in an effective manner, and the latter ended on 06/15/2018. In this case, the employee would only keep an observation (row) containing the initial date on 03/15/2013 when he assumed the first function and the end date on 06/15/2018, when he assumed the last effective function.
The following is part of the dataset:
obs | name | function_type | date_initial | date_final |
1 | Douglas | effective | 15/03/2013 | 19/04/2015 |
2 | Douglas | effective | 20/04/2015 | 07/08/2016 |
3 | Douglas | effective | 08/08/2016 | 15/06/2018 |
4 | Douglas | temporary | 16/06/2018 | 16/12/2018 |
5 | Emily | effective | 15/02/2015 | 16/09/2017 |
6 | Emily | effective | 17/09/2017 | 15/12/2018 |
7 | Olivia | effective | 03/01/2014 | 25/04/2016 |
8 | Olivia | effective | 17/07/2016 | 20/11/2017 |
9 | Olivia | temporary | 21/11/2017 | 03/03/2018 |
And here is the result that I hope to obtain:
obs | name | function_type | date_initial | date_final |
1 | Douglas | effective | 15/03/2013 | 15/06/2018 |
2 | Douglas | temporary | 16/06/2018 | 16/12/2018 |
3 | Emily | effective | 15/02/2015 | 15/12/2018 |
4 | Olivia | effective | 03/01/2014 | 25/04/2016 |
5 | Olivia | effective | 17/07/2016 | 20/11/2017 |
6 | Olivia | temporary | 21/11/2017 | 03/03/2018 |
I would appreciate it if you could help me solve this problem.
data have;
input name:$ function_type:$9. date_initial:ddmmyy10. date_final:ddmmyy10.;
lag_date=lag(date_final);
format date_initial date_final lag_date date9.;
cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
data want;
set have;
by name function_type date_initial;
if first.function_type then lag_date=.;
dif=date_initial-lag_date;
if first.date_initial and (dif eq . or dif >1);
drop lag_date dif;
run;
data a;
input name:$ function_type:$9. date_initial:ddmmyy10. date_final:ddmmyy10.;
format date_initial date_final date9.;
cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
run;
data result;
set a;
by name function_type date_initial;
retain nstart nend;
select;
when(first.function_type) do;
nstart=date_initial;nend=date_final;
end;
when(date_initial=nend+1) do;
nend=date_final;
end;
otherwise do;
nstart=date_initial;nend=date_final;
end;
end;
run;
data result;
set result;
by name function_type nstart nend;
if last.nstart then do;
date_initial=nstart;
output;
end;
drop nstart nend;
run;
data a;
input name:$ function_type:$9. date_initial:ddmmyy10. date_final:ddmmyy10.;
format date_initial date_final ddmmyy10.;
cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
run;
data b;
set a;
by name function_type;
if first.function_type then group=0;
if date_initial-lag(date_final) > 1 then group+1;
run;
data want;
set b(rename=(date_initial=_date_initial));
by name function_type group;
retain date_initial;
if first.group then date_initial=_date_initial;
if last.group;
drop group _date_initial;
format date_initial ddmmyy10.;
run;
data a;
input name:$ function_type:$9. date_initial:ddmmyy10. date_final:ddmmyy10.;
format date_initial date_final date9.;
cards;
Douglas effective 15/03/2013 19/04/2015 S
Douglas effective 20/04/2015 07/08/2016 -
Douglas effective 08/08/2016 15/06/2018 F
Douglas temporary 16/06/2018 16/12/2018 SF
Emily effective 15/02/2015 16/09/2017 S
Emily effective 17/09/2017 15/12/2018 F
Olivia effective 03/01/2014 25/04/2016 SF
Olivia effective 17/07/2016 20/11/2017 SF
Olivia temporary 21/11/2017 03/03/2018 SF
;
run;
data want (drop=_:);
set a (keep=name function_type);
by name function_type;
merge a
a (firstobs=2 keep=date_initial rename=(date_initial=_nxt_initial));
retain _save_initial ;
if first.function_type or date_initial-1 ^=lag(date_final)
then _save_initial=date_initial;
if last.function_type or _nxt_initial^=date_final+1;
date_initial=_save_initial;
run;
How it works:
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.