I have a dataset containing the start date (date_initial) and the end date (date_final) that an employee has assumed a function (effective or temporary) in a company. I would like to reduce the number of rows, by type of function assumed (effective or temporary) in the company, since the start date of a most recently assumed function was obtained the day after the completion of another previous function.
For example, the employee Douglas assumed the oldest effective function from 03/15/2013 until 04/19/2016. Subsequently, he uninterruptedly assumed other functions in an effective manner, and the latter ended on 06/15/2018. In this case, the employee would only keep an observation (row) containing the initial date on 03/15/2013 when he assumed the first function and the end date on 06/15/2018, when he assumed the last effective function.
The following is part of the dataset:
| obs | name | function_type | date_initial | date_final |
| 1 | Douglas | effective | 15/03/2013 | 19/04/2015 |
| 2 | Douglas | effective | 20/04/2015 | 07/08/2016 |
| 3 | Douglas | effective | 08/08/2016 | 15/06/2018 |
| 4 | Douglas | temporary | 16/06/2018 | 16/12/2018 |
| 5 | Emily | effective | 15/02/2015 | 16/09/2017 |
| 6 | Emily | effective | 17/09/2017 | 15/12/2018 |
| 7 | Olivia | effective | 03/01/2014 | 25/04/2016 |
| 8 | Olivia | effective | 17/07/2016 | 20/11/2017 |
| 9 | Olivia | temporary | 21/11/2017 | 03/03/2018 |
And here is the result that I hope to obtain:
| obs | name | function_type | date_initial | date_final |
| 1 | Douglas | effective | 15/03/2013 | 15/06/2018 |
| 2 | Douglas | temporary | 16/06/2018 | 16/12/2018 |
| 3 | Emily | effective | 15/02/2015 | 15/12/2018 |
| 4 | Olivia | effective | 03/01/2014 | 25/04/2016 |
| 5 | Olivia | effective | 17/07/2016 | 20/11/2017 |
| 6 | Olivia | temporary | 21/11/2017 | 03/03/2018 |
I would appreciate it if you could help me solve this problem.
data have;
input name:$ function_type:$9. date_initial:ddmmyy10. date_final:ddmmyy10.;
lag_date=lag(date_final);
format date_initial date_final lag_date date9.;
cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
data want;
set have;
by name function_type date_initial;
if first.function_type then lag_date=.;
dif=date_initial-lag_date;
if first.date_initial and (dif eq . or dif >1);
drop lag_date dif;
run;
data a;
input name:$ function_type:$9. date_initial:ddmmyy10. date_final:ddmmyy10.;
format date_initial date_final date9.;
cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
run;
data result;
set a;
by name function_type date_initial;
retain nstart nend;
select;
when(first.function_type) do;
nstart=date_initial;nend=date_final;
end;
when(date_initial=nend+1) do;
nend=date_final;
end;
otherwise do;
nstart=date_initial;nend=date_final;
end;
end;
run;
data result;
set result;
by name function_type nstart nend;
if last.nstart then do;
date_initial=nstart;
output;
end;
drop nstart nend;
run;
data a;
input name:$ function_type:$9. date_initial:ddmmyy10. date_final:ddmmyy10.;
format date_initial date_final ddmmyy10.;
cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
run;
data b;
set a;
by name function_type;
if first.function_type then group=0;
if date_initial-lag(date_final) > 1 then group+1;
run;
data want;
set b(rename=(date_initial=_date_initial));
by name function_type group;
retain date_initial;
if first.group then date_initial=_date_initial;
if last.group;
drop group _date_initial;
format date_initial ddmmyy10.;
run;
data a;
input name:$ function_type:$9. date_initial:ddmmyy10. date_final:ddmmyy10.;
format date_initial date_final date9.;
cards;
Douglas effective 15/03/2013 19/04/2015 S
Douglas effective 20/04/2015 07/08/2016 -
Douglas effective 08/08/2016 15/06/2018 F
Douglas temporary 16/06/2018 16/12/2018 SF
Emily effective 15/02/2015 16/09/2017 S
Emily effective 17/09/2017 15/12/2018 F
Olivia effective 03/01/2014 25/04/2016 SF
Olivia effective 17/07/2016 20/11/2017 SF
Olivia temporary 21/11/2017 03/03/2018 SF
;
run;
data want (drop=_:);
set a (keep=name function_type);
by name function_type;
merge a
a (firstobs=2 keep=date_initial rename=(date_initial=_nxt_initial));
retain _save_initial ;
if first.function_type or date_initial-1 ^=lag(date_final)
then _save_initial=date_initial;
if last.function_type or _nxt_initial^=date_final+1;
date_initial=_save_initial;
run;
How it works:
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.