<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I reduce duplicate rows/observations by keeping records of different observations? in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/522050#M4344</link>
    <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data a;
	input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
	format date_initial date_final date9.;
	cards;
Douglas effective 15/03/2013 19/04/2015  S
Douglas effective 20/04/2015 07/08/2016  -
Douglas effective 08/08/2016 15/06/2018  F
Douglas temporary 16/06/2018 16/12/2018  SF
Emily effective 15/02/2015 16/09/2017    S
Emily effective 17/09/2017 15/12/2018    F
Olivia effective 03/01/2014 25/04/2016   SF
Olivia effective 17/07/2016 20/11/2017   SF
Olivia temporary 21/11/2017 03/03/2018   SF
;
run;
data want (drop=_:);
  set a (keep=name function_type);
  by name function_type;
  merge a 
        a (firstobs=2 keep=date_initial rename=(date_initial=_nxt_initial));

  retain _save_initial ;
 
  if first.function_type or date_initial-1 ^=lag(date_final) 
    then _save_initial=date_initial;

  if last.function_type or _nxt_initial^=date_final+1;
  date_initial=_save_initial;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How it works:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The SET and BY statements are there merely to generate the automatic dummy vars first.function_type and last.function_type.&amp;nbsp; This lets you know when the observation in hand is the beginning (or end) of a function_type group.&lt;/LI&gt;
&lt;LI&gt;The merge of A with itself provides a way to look ahead one observation (the firstobs=2 option).&amp;nbsp; The lookahead record keeps only one variable - date_initial - which is renamed to _nxt_initial to keep it distinct from date_initial in the current record.&lt;/LI&gt;
&lt;LI&gt;retain _save_initial statement is needed because if a time span covers more the one observation,&amp;nbsp;the corresponding beginning date_initial value has to be retained across observations.&lt;/LI&gt;
&lt;LI&gt;The "if first.function_type or ... " detects&amp;nbsp;the start of any time-span and saves the date_initial value.&lt;/LI&gt;
&lt;LI&gt;The "if last.function_type or ..." is a subsetting if, keeping only observations that must represent the end of a time-span.&lt;/LI&gt;
&lt;LI&gt;Having passed the subsetting if filter, the date_initial value is recovered from the earlier saved value.&lt;/LI&gt;
&lt;/OL&gt;</description>
    <pubDate>Mon, 17 Dec 2018 22:36:26 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2018-12-17T22:36:26Z</dc:date>
    <item>
      <title>How do I reduce duplicate rows/observations by keeping records of different observations?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/521753#M4278</link>
      <description>&lt;P&gt;I have a dataset containing the start date (date_initial) and the end date (date_final) that an employee has assumed a function (effective or temporary) in a company. I would like to reduce the number of rows, by type of function assumed (effective or temporary) in the company, since the start date of a most recently assumed function was obtained the day after the completion of another previous function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example, the employee Douglas assumed the oldest &lt;SPAN&gt;effective&amp;nbsp;&lt;/SPAN&gt;function from 03/15/2013 until 04/19/2016. Subsequently, he uninterruptedly assumed other functions in an effective manner, and the latter ended on 06/15/2018. In this case, the employee would only keep an observation (row) containing the initial date on 03/15/2013 when he assumed the first function and the end date on 06/15/2018, when he assumed the last effective function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The following is part of the dataset:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;obs&lt;/TD&gt;&lt;TD&gt;name&lt;/TD&gt;&lt;TD&gt;function_type&lt;/TD&gt;&lt;TD&gt;date_initial&lt;/TD&gt;&lt;TD&gt;date_final&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;Douglas&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;15/03/2013&lt;/TD&gt;&lt;TD&gt;19/04/2015&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Douglas&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;20/04/2015&lt;/TD&gt;&lt;TD&gt;07/08/2016&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;Douglas&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;08/08/2016&lt;/TD&gt;&lt;TD&gt;15/06/2018&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;Douglas&lt;/TD&gt;&lt;TD&gt;temporary&lt;/TD&gt;&lt;TD&gt;16/06/2018&lt;/TD&gt;&lt;TD&gt;16/12/2018&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;Emily&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;15/02/2015&lt;/TD&gt;&lt;TD&gt;16/09/2017&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;Emily&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;17/09/2017&lt;/TD&gt;&lt;TD&gt;15/12/2018&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;Olivia&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;03/01/2014&lt;/TD&gt;&lt;TD&gt;25/04/2016&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;TD&gt;Olivia&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;17/07/2016&lt;/TD&gt;&lt;TD&gt;20/11/2017&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;9&lt;/TD&gt;&lt;TD&gt;Olivia&lt;/TD&gt;&lt;TD&gt;temporary&lt;/TD&gt;&lt;TD&gt;21/11/2017&lt;/TD&gt;&lt;TD&gt;03/03/2018&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And here is the result that I hope to obtain:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;obs&lt;/TD&gt;&lt;TD&gt;name&lt;/TD&gt;&lt;TD&gt;function_type&lt;/TD&gt;&lt;TD&gt;date_initial&lt;/TD&gt;&lt;TD&gt;date_final&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;Douglas&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;15/03/2013&lt;/TD&gt;&lt;TD&gt;15/06/2018&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Douglas&lt;/TD&gt;&lt;TD&gt;temporary&lt;/TD&gt;&lt;TD&gt;16/06/2018&lt;/TD&gt;&lt;TD&gt;16/12/2018&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;Emily&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;15/02/2015&lt;/TD&gt;&lt;TD&gt;15/12/2018&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;Olivia&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;03/01/2014&lt;/TD&gt;&lt;TD&gt;25/04/2016&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;Olivia&lt;/TD&gt;&lt;TD&gt;effective&lt;/TD&gt;&lt;TD&gt;17/07/2016&lt;/TD&gt;&lt;TD&gt;20/11/2017&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;Olivia&lt;/TD&gt;&lt;TD&gt;temporary&lt;/TD&gt;&lt;TD&gt;21/11/2017&lt;/TD&gt;&lt;TD&gt;03/03/2018&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would appreciate it if you could help me solve this problem.&lt;/P&gt;</description>
      <pubDate>Sat, 15 Dec 2018 23:33:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/521753#M4278</guid>
      <dc:creator>Ronnyclay32</dc:creator>
      <dc:date>2018-12-15T23:33:53Z</dc:date>
    </item>
    <item>
      <title>Re: How do I reduce duplicate rows/observations by keeping records of different observations?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/521828#M4301</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
lag_date=lag(date_final);
format date_initial date_final lag_date date9.;
cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;

data want;
set have;
by name function_type date_initial;
if first.function_type then lag_date=.;
dif=date_initial-lag_date;
if first.date_initial and (dif eq . or dif &amp;gt;1); 
drop lag_date dif;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 16 Dec 2018 23:53:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/521828#M4301</guid>
      <dc:creator>Jagadishkatam</dc:creator>
      <dc:date>2018-12-16T23:53:07Z</dc:date>
    </item>
    <item>
      <title>Re: How do I reduce duplicate rows/observations by keeping records of different observations?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/521856#M4304</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data a;
	input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
	format date_initial date_final date9.;
	cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
run;
data result;
	set a;
	by name function_type date_initial;
	retain nstart nend;
	select;
	   when(first.function_type) do;
			nstart=date_initial;nend=date_final;
	   end;
	   when(date_initial=nend+1) do;
			 nend=date_final;
	   end;
	   otherwise do;
	  		 nstart=date_initial;nend=date_final;
	   end;
	end;
run;
data result;
	set result;
	by name function_type nstart nend;
	if last.nstart then do;
		date_initial=nstart;
		output;
	end;
	drop nstart nend;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 17 Dec 2018 08:34:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/521856#M4304</guid>
      <dc:creator>learsaas</dc:creator>
      <dc:date>2018-12-17T08:34:03Z</dc:date>
    </item>
    <item>
      <title>Re: How do I reduce duplicate rows/observations by keeping records of different observations?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/521892#M4315</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data a;
	input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
	format date_initial date_final  ddmmyy10.;
	cards;
Douglas effective 15/03/2013 19/04/2015
Douglas effective 20/04/2015 07/08/2016
Douglas effective 08/08/2016 15/06/2018
Douglas temporary 16/06/2018 16/12/2018
Emily effective 15/02/2015 16/09/2017
Emily effective 17/09/2017 15/12/2018
Olivia effective 03/01/2014 25/04/2016
Olivia effective 17/07/2016 20/11/2017
Olivia temporary 21/11/2017 03/03/2018
;
run;
data b;
 set a;
 by name function_type;
 if first.function_type then group=0;
 if date_initial-lag(date_final) &amp;gt; 1 then group+1;
run;
data want;
 set b(rename=(date_initial=_date_initial));
 by name function_type group;
 retain date_initial;
 if first.group then date_initial=_date_initial;
 if last.group;
 drop group _date_initial;
 format date_initial ddmmyy10.;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 17 Dec 2018 13:02:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/521892#M4315</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-12-17T13:02:46Z</dc:date>
    </item>
    <item>
      <title>Re: How do I reduce duplicate rows/observations by keeping records of different observations?</title>
      <link>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/522050#M4344</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data a;
	input 	name:$	function_type:$9.	date_initial:ddmmyy10.	date_final:ddmmyy10.;
	format date_initial date_final date9.;
	cards;
Douglas effective 15/03/2013 19/04/2015  S
Douglas effective 20/04/2015 07/08/2016  -
Douglas effective 08/08/2016 15/06/2018  F
Douglas temporary 16/06/2018 16/12/2018  SF
Emily effective 15/02/2015 16/09/2017    S
Emily effective 17/09/2017 15/12/2018    F
Olivia effective 03/01/2014 25/04/2016   SF
Olivia effective 17/07/2016 20/11/2017   SF
Olivia temporary 21/11/2017 03/03/2018   SF
;
run;
data want (drop=_:);
  set a (keep=name function_type);
  by name function_type;
  merge a 
        a (firstobs=2 keep=date_initial rename=(date_initial=_nxt_initial));

  retain _save_initial ;
 
  if first.function_type or date_initial-1 ^=lag(date_final) 
    then _save_initial=date_initial;

  if last.function_type or _nxt_initial^=date_final+1;
  date_initial=_save_initial;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How it works:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The SET and BY statements are there merely to generate the automatic dummy vars first.function_type and last.function_type.&amp;nbsp; This lets you know when the observation in hand is the beginning (or end) of a function_type group.&lt;/LI&gt;
&lt;LI&gt;The merge of A with itself provides a way to look ahead one observation (the firstobs=2 option).&amp;nbsp; The lookahead record keeps only one variable - date_initial - which is renamed to _nxt_initial to keep it distinct from date_initial in the current record.&lt;/LI&gt;
&lt;LI&gt;retain _save_initial statement is needed because if a time span covers more the one observation,&amp;nbsp;the corresponding beginning date_initial value has to be retained across observations.&lt;/LI&gt;
&lt;LI&gt;The "if first.function_type or ... " detects&amp;nbsp;the start of any time-span and saves the date_initial value.&lt;/LI&gt;
&lt;LI&gt;The "if last.function_type or ..." is a subsetting if, keeping only observations that must represent the end of a time-span.&lt;/LI&gt;
&lt;LI&gt;Having passed the subsetting if filter, the date_initial value is recovered from the earlier saved value.&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Mon, 17 Dec 2018 22:36:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-records/m-p/522050#M4344</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-12-17T22:36:26Z</dc:date>
    </item>
  </channel>
</rss>

