BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rlafond
Obsidian | Level 7

 

 

I'm struggling to come up with a solution to this problem and could use some input.

 

For each ID, I have a variable amount of date ranges. I am trying to create a single row for each ID that shows 1) the most recent end date (DateThru); and 2) the earliest start date (DateFrom) without a break greater than 1 day (using the most recent DateThru as an anchor).

 

Here is an idea of what my data looks like:

 

IDDateFromDateThru
12310/1/20045/25/2006
1235/26/20068/28/2007
1238/29/20073/17/2008
1233/18/200810/14/2008
12312/7/201512/31/2015
1232/12/201612/31/9999

 

In this example I would want to only output the last row. The second newest date range ended 12/31/15, so it ended greater than 1 day before 2/12/16. All other records are immaterial.

 

Here is another example:

 

IDDateFromDateThru
2211/1/20055/26/2008
22111/11/20118/1/2015
2218/2/201510/12/2015
22110/13/20156/30/2018
2217/1/201812/31/9999

 

In this example I would want to see a single row for ID=221 that shows DateFrom = 11/11/2011 and DateThru = 12/31/9999. All but the first row show continuous date ranges with gaps of 1 day only.

 

Any thoughts?

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

@rlafond - and here is yet a third way that doesn't require any joining of tables:

 

data have;
  input id datefrom :mmddyy10. datethru :mmddyy10.;
  format date: date9.;
datalines;
123 10/1/2004 5/25/2006
123 5/26/2006 8/28/2007
123 8/29/2007 3/17/2008
123 3/18/2008 10/14/2008
123 12/7/2015 12/31/2015
123 2/12/2016 12/31/9999
221 1/1/2005 5/26/2008 
221 11/11/2011 8/1/2015 
221 8/2/2015 10/12/2015 
221 10/13/2015 6/30/2018 
221 7/1/2018 12/31/9999
; 
run;

proc sort data =  have;
  by id datefrom;
run;

data want;
  keep id datefrom_new datethru;
  rename datefrom_new = datefrom; 
  set have;
  by id;
  retain datefrom_new datethru_last;
  format datefrom_new date9.;
  if first.id then do;
    datethru_last = .;
    datefrom_new = datefrom;
  end;
  else do;
    if datefrom - datethru_last > 1 then datefrom_new = datefrom;
  end;
  if last.id then output; 
  datethru_last = datethru;
run;

View solution in original post

5 REPLIES 5
mkeintz
PROC Star

What do you mean by "a break greater than one day"?  Does that mean the DATEFROM in a record must start on the day immediately following DATETHRU in the preceding record?  Or does it mean that DATEFROM can be the 2nd day after the prior DATETHRU, leaving exactly 1 day uncovered?  I.e.  what's a "break"?

 

Assuming you want no uncovered days, then you want to generate continuous time spans constructed from a sequence of records, and keep only the last of those time spans.  I assume your data are sorted by ID DATEFROM, and that DATETHRU is always greater than or equal to DATEFROM:

 

data have;
  input id datefrom :mmddyy10. datethru :mmddyy10.;
  format date: date9.;
datalines;
123 10/1/2004 5/25/2006
123 5/26/2006 8/28/2007
123 8/29/2007 3/17/2008
123 3/18/2008 10/14/2008
123 12/7/2015 12/31/2015
123 2/12/2016 12/31/9999
221 1/1/2005 5/26/2008 
221 11/11/2011 8/1/2015 
221 8/2/2015 10/12/2015 
221 10/13/2015 6/30/2018 
221 7/1/2018 12/31/9999 
run;

data want (drop=_:);
  set have (keep=id);
  by id;
  merge have 
        have (firstobs=2 keep=datefrom rename=(datefrom=_nxt_from));

  if first.id=1 or datefrom-1>lag(datethru) then _datefrom=datefrom;
  else datefrom=_datefrom;

  if last.id=1;
  retain _datefrom;
run;

 

 

The task here is to detect whenever the incoming record is either the beginning of an id, or is more than 1 day after the preceding record.  In that case store the current DATEFROM value into a retained variable (_DATEFROM), otherwise assign the retained _DATEFROM value into the current DATEFROM variable.   No modifications are needed to DATETHRU.  Then just keep the last incoming record, which would have its DATEFROM value properly updated, if needed.

 

Note if you actually will permit uncovered single days in successive records, just change

 

if first.id=1 or datefrom-1>lag(datethru) then ...

to

 

if first.id=1 or datefrom-2>lag(datethru) then ...
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

data have;
input ID	(DateFrom 	DateThru) (:mmddyy10.);
format datefrom datethru mmddyy10.;
cards;
123	10/1/2004	5/25/2006
123	5/26/2006	8/28/2007
123	8/29/2007	3/17/2008
123	3/18/2008	10/14/2008
123	12/7/2015	12/31/2015
123	2/12/2016	12/31/9999
221	1/1/2005	5/26/2008
221	11/11/2011	8/1/2015
221	8/2/2015	10/12/2015
221	10/13/2015	6/30/2018
221	7/1/2018	12/31/9999
;
data temp;
set have;
by id;
if first.id then f=1; 
If not first.id and DateFrom ne lag(DateThru)+1 then f+1;
run;

proc sql;
create table want as
select ID, min(datefrom) as Datefrom format=mmddyy10., max(DateThru) as Datethru format=mmddyy10.
from (select * from temp group by id having f=max(f))
group by id, f ;
quit;
SASKiwi
PROC Star

@rlafond - and here is yet a third way that doesn't require any joining of tables:

 

data have;
  input id datefrom :mmddyy10. datethru :mmddyy10.;
  format date: date9.;
datalines;
123 10/1/2004 5/25/2006
123 5/26/2006 8/28/2007
123 8/29/2007 3/17/2008
123 3/18/2008 10/14/2008
123 12/7/2015 12/31/2015
123 2/12/2016 12/31/9999
221 1/1/2005 5/26/2008 
221 11/11/2011 8/1/2015 
221 8/2/2015 10/12/2015 
221 10/13/2015 6/30/2018 
221 7/1/2018 12/31/9999
; 
run;

proc sort data =  have;
  by id datefrom;
run;

data want;
  keep id datefrom_new datethru;
  rename datefrom_new = datefrom; 
  set have;
  by id;
  retain datefrom_new datethru_last;
  format datefrom_new date9.;
  if first.id then do;
    datethru_last = .;
    datefrom_new = datefrom;
  end;
  else do;
    if datefrom - datethru_last > 1 then datefrom_new = datefrom;
  end;
  if last.id then output; 
  datethru_last = datethru;
run;
rlafond
Obsidian | Level 7
This wasn't the first answer, but I'm a sucker for DO loops. Thank you!
learsaas
Quartz | Level 8
data result;
	set a;
        by id datefrom;
	retain nstart nend;
	if first.id or datefrom^=nend+1 then nstart=datefrom;
	nend=datethru;
	if last.id then do;
		datefrom=nstart;output;
	end;
	drop nstart nend;
run;

https://communities.sas.com/t5/New-SAS-User/How-do-I-reduce-duplicate-rows-observations-by-keeping-r...

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1697 views
  • 4 likes
  • 5 in conversation