BookmarkSubscribeRSS Feed
Ronnyclay32
Calcite | Level 5

I have a dataset containing the start date (date_initial) and the end date (date_final) that an employee has assumed a function (effective or temporary) in a company. I would like to reduce the number of rows, by type of function assumed (effective or temporary) in the company, since the start date of a most recently assumed function was obtained the day after the completion of another previous function.

 

For example, the employee Douglas assumed the oldest effective function from 03/15/2013 until 04/19/2016. Subsequently, he uninterruptedly assumed other functions in an effective manner, and the latter ended on 06/15/2018. In this case, the employee would only keep an observation (row) containing the initial date on 03/15/2013 when he assumed the first function and the end date on 06/15/2018, when he assumed the last effective function.

 

The following is part of the dataset:

obsnamefunction_typedate_initialdate_final
1Douglaseffective15/03/201319/04/2015
2Douglaseffective20/04/201507/08/2016
3Douglaseffective08/08/201615/06/2018
4Douglastemporary16/06/201816/12/2018
5Emilyeffective15/02/201516/09/2017
6Emilyeffective17/09/201715/12/2018
7Oliviaeffective03/01/201425/04/2016
8Oliviaeffective17/07/201620/11/2017
9Oliviatemporary21/11/201703/03/2018

 

And here is the result that I hope to obtain:

obsnamefunction_typedate_initialdate_final
1Douglaseffective15/03/201315/06/2018
2Douglastemporary16/06/201816/12/2018
3Emilyeffective15/02/201515/12/2018
4Oliviaeffective03/01/201425/04/2016
5Oliviaeffective17/07/201620/11/2017
6Oliviatemporary21/11/201703/03/2018

 

I would appreciate it if you could help me solve this problem.

4 REPLIES 4
Jagadishkatam
Amethyst | Level 16
data have;
input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
lag_date=lag(date_final);
format date_initial date_final lag_date date9.;
cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;

data want;
set have;
by name function_type date_initial;
if first.function_type then lag_date=.;
dif=date_initial-lag_date;
if first.date_initial and (dif eq . or dif >1); 
drop lag_date dif;
run;
Thanks,
Jag
learsaas
Quartz | Level 8
data a;
	input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
	format date_initial date_final date9.;
	cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
run;
data result;
	set a;
	by name function_type date_initial;
	retain nstart nend;
	select;
	   when(first.function_type) do;
			nstart=date_initial;nend=date_final;
	   end;
	   when(date_initial=nend+1) do;
			 nend=date_final;
	   end;
	   otherwise do;
	  		 nstart=date_initial;nend=date_final;
	   end;
	end;
run;
data result;
	set result;
	by name function_type nstart nend;
	if last.nstart then do;
		date_initial=nstart;
		output;
	end;
	drop nstart nend;
run;
Ksharp
Super User
data a;
	input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
	format date_initial date_final  ddmmyy10.;
	cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
run;
data b;
 set a;
 by name function_type;
 if first.function_type then group=0;
 if date_initial-lag(date_final) > 1 then group+1;
run;
data want;
 set b(rename=(date_initial=_date_initial));
 by name function_type group;
 retain date_initial;
 if first.group then date_initial=_date_initial;
 if last.group;
 drop group _date_initial;
 format date_initial ddmmyy10.;
run;
mkeintz
PROC Star
data a;
	input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
	format date_initial date_final date9.;
	cards;
Douglas effective 15/03/2013 19/04/2015  S
Douglas effective 20/04/2015 07/08/2016  -
Douglas effective 08/08/2016 15/06/2018  F
Douglas temporary 16/06/2018 16/12/2018  SF
Emily effective 15/02/2015 16/09/2017    S
Emily effective 17/09/2017 15/12/2018    F
Olivia effective 03/01/2014 25/04/2016   SF
Olivia effective 17/07/2016 20/11/2017   SF
Olivia temporary 21/11/2017 03/03/2018   SF
;
run;
data want (drop=_:);
  set a (keep=name function_type);
  by name function_type;
  merge a 
        a (firstobs=2 keep=date_initial rename=(date_initial=_nxt_initial));

  retain _save_initial ;
 
  if first.function_type or date_initial-1 ^=lag(date_final) 
    then _save_initial=date_initial;

  if last.function_type or _nxt_initial^=date_final+1;
  date_initial=_save_initial;
run;

 

How it works:

  1. The SET and BY statements are there merely to generate the automatic dummy vars first.function_type and last.function_type.  This lets you know when the observation in hand is the beginning (or end) of a function_type group.
  2. The merge of A with itself provides a way to look ahead one observation (the firstobs=2 option).  The lookahead record keeps only one variable - date_initial - which is renamed to _nxt_initial to keep it distinct from date_initial in the current record.
  3. retain _save_initial statement is needed because if a time span covers more the one observation, the corresponding beginning date_initial value has to be retained across observations.
  4. The "if first.function_type or ... " detects the start of any time-span and saves the date_initial value.
  5. The "if last.function_type or ..." is a subsetting if, keeping only observations that must represent the end of a time-span.
  6. Having passed the subsetting if filter, the date_initial value is recovered from the earlier saved value.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1006 views
  • 0 likes
  • 5 in conversation