Lapis Lazuli | Level 10

## Date sequence Problem

Hi All,

``````data have;
input SUBJECT START_DATE \$9. END_DATE \$10. seq;
cards;
3008 01OCT2020 01OCT2020 1
3008 02OCT2020 04OCT2020 2
3008 05OCT2020 06OCT2020 3
3008 08OCT2020 05OCT2020 4
3009 03OCT2020 03OCT2020 1
3009 04OCT2020 05OCT2020 2
3009 05OCT2020 05OCT2020 3
3009 06OCT2020 07OCT2020 4
;
run;``````

May requirement is:

1. If First event has started and ended, after ending first event the second event must be start next to first ending date then min START_DATE and max END_DATE need to be consider in final output as First dataset.

2. If first event end date and second event start date has more then one day difference then it must be flagged. in second dataset.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Date sequence Problem

Please show what you expect as output, if this is the given input.
3 REPLIES 3
Super User

## Re: Date sequence Problem

Please show what you expect as output, if this is the given input.
Lapis Lazuli | Level 10

## Re: Date sequence Problem

@Reeza : Sorry by mistake i clicked on solution button.

here is my input and output requirement.

``````data have;
input SUBJECT START_DATE \$9. END_DATE \$10. seq;
cards;
3008 01OCT2020 01OCT2020 1
3008 02OCT2020 04OCT2020 2
3008 05OCT2020 06OCT2020 3
3008 08OCT2020 10OCT2020 4
3009 03OCT2020 03OCT2020 1
3009 04OCT2020 05OCT2020 2
3009 05OCT2020 05OCT2020 3
3009 06OCT2020 07OCT2020 4
3010 13OCT2020 15OCT2020 1
3010 17OCT2020 18OCT2020 2
;
run;

Output:

First Dataset:

SUBJECT START_DATE END_DATE
3008    01OCT2020  06OCT2020
3009	03OCT2020  07OCT2020
3010	13OCT2020  15OCT2020

Second Dataset:

SUBJECT START_DATE END_DATE
3008	 08OCT2020 10OCT2020
3010	 17OCT2020 18OCT2020

``````

Thank you

Super User

## Re: Date sequence Problem

Sounds like you just want to collapse overlapping (or close to overlapping) periods into one period.  Not sure why you want two datasets out when one dataset can contain all of the periods with just a separate variable to indicate which period it is.

So first make sure your dates are actual date variables (and not strings like you example).

``````data have;
input SUBJECT (START_DATE END_DATE) (:date.) seq;
format start_date end_date date9.;
cards;
3008 01OCT2020 01OCT2020 1
3008 02OCT2020 04OCT2020 2
3008 05OCT2020 06OCT2020 3
3008 08OCT2020 10OCT2020 4
3009 03OCT2020 03OCT2020 1
3009 04OCT2020 05OCT2020 2
3009 05OCT2020 05OCT2020 3
3009 06OCT2020 07OCT2020 4
3010 13OCT2020 15OCT2020 1
3010 17OCT2020 18OCT2020 2
;``````

Then generate a PERIOD variable to flag the overlapping periods.  Adjust the IF condition if you want to allow different size gap between start and end to be considered part of the same large period.

Including START_DATE and END_DATE in the BY statement will cause the data step to insure the observations are sorted properly.

``````data periods ;
set have ;
by subject start_date end_date ;
lag_end = lag(end_date);
if first.subject then period=1;
else if start_date > lag_end+1 then period+1;
drop lag_end;
run;``````

Now collapse to one observation per derived period.

``````data want;
set periods ;
by subject period;
if first.period then start=start_date ;
if last.period then do;
start_date=start;
output;
end;
retain start;
drop start seq;
run;``````

Results:

```                     START_
Obs    SUBJECT         DATE     END_DATE    period

1       3008     01OCT2020    06OCT2020       1
2       3008     08OCT2020    10OCT2020       2
3       3009     03OCT2020    07OCT2020       1
4       3010     13OCT2020    15OCT2020       1
5       3010     17OCT2020    18OCT2020       2
```

Note if the original start/end date data is messier with overlapping or nested time periods you will need more complex logic to determine the overlapping periods.

Discussion stats
• 3 replies
• 525 views
• 1 like
• 3 in conversation