## How to summarize dataset by Min.and Max date value by Group

Solved
Occasional Contributor
Posts: 5

# How to summarize dataset by Min.and Max date value by Group

Hi, I want to track changes of a person's status and find their minimum start and maximum end dates for each change. The raw datatable below is a sample where you can see for person A the minimum start and maximum end date for their Main status are 01Jan15 to 01Feb16. So for Row 1 and 2 there's no change in status and since EndDt for row 1 is 06July15 and StartDt for row 2 is 07July15 there's no break in the period, and the two rows should be aggregated into one row start on 01Jan15 and end on 01Feb16.

The 3rd row should be kept because there's a break in the period even though status stays in Main, StartDt is 01Oct16 and EndDt is 31Dec16.

The 4th and 5th row should be aggregated into one row that reads status in Co and StartDt is 01Jan17 and EndDt is 01Sep17.

The 6th row should be kept unchanged start from 30Oct17 to 05Nov17.

The desired reduced datatable is copied below the raw datatable

Raw datatable

 Row ID Status StartDt EndDt 1 A Main 01-Jan-15 06-Jul-15 2 A Main 07-Jul-15 01-Feb-16 3 A Main 01-Oct-16 31-Dec-16 4 A Co 01-Jan-17 07-Jul-17 5 A Co 08-Jul-17 01-Sep-17 6 A Co 30-Oct-17 05-Nov-17

Desired reduced datatable

 ID Status MinStartDt MaxEndDt A Main 01-Jan-15 01-Feb-16 A Main 01-Oct-16 31-Dec-16 A Co 01-Jan-17 01-Sep-17 A Co 30-Oct-17 05-Nov-17

This problem has been bugging me for days, I appreciate any suggestions I could get. Also I am new to this, so could you please give more details in your answers?

Many Thanks!

Michelle

Accepted Solutions
Solution
‎11-30-2017 02:42 PM
Super User
Posts: 10,691

## Re: How to summarize dataset by Min.and Max date value by Group

``````data have;
infile cards expandtabs truncover;
input Row	ID \$	Status \$	StartDt : date9.	EndDt : date9.;
format StartDt EndDt date9.;
cards;
1	A	Main	01-Jan-15	06-Jul-15
2	A	Main	07-Jul-15	01-Feb-16
3	A	Main	01-Oct-16	31-Dec-16
4	A	Co	01-Jan-17	07-Jul-17
5	A	Co	08-Jul-17	01-Sep-17
6	A	Co	30-Oct-17	05-Nov-17
;
run;
data temp;
set have;
by id status notsorted;
if first.status or startdt-lag(enddt) ne 1 then group+1;
run;
data want;
set temp;
by group;
retain start .;
if first.group then start=startdt;
if last.group;
format start date9.;
drop row startdt;
run;``````

All Replies
Posts: 1,831

## Re: How to summarize dataset by Min.and Max date value by Group

[ Edited ]

1) sort dataset by is status startDT.

2) use retain and lag function to save previous obs values of startDT endDT.

3) on first.status keep the retained startDT value into a new variable.

4) on last.status override the current startDT value with the new retained startDT and do output.

else (any obs neither first nor last) - check is startDT > retained endDT +1.

if positive do:

-  output

- override retained startDT with the current startDT.

cretae your code and come back if you have any issue.

Solution
‎11-30-2017 02:42 PM
Super User
Posts: 10,691

## Re: How to summarize dataset by Min.and Max date value by Group

``````data have;
infile cards expandtabs truncover;
input Row	ID \$	Status \$	StartDt : date9.	EndDt : date9.;
format StartDt EndDt date9.;
cards;
1	A	Main	01-Jan-15	06-Jul-15
2	A	Main	07-Jul-15	01-Feb-16
3	A	Main	01-Oct-16	31-Dec-16
4	A	Co	01-Jan-17	07-Jul-17
5	A	Co	08-Jul-17	01-Sep-17
6	A	Co	30-Oct-17	05-Nov-17
;
run;
data temp;
set have;
by id status notsorted;
if first.status or startdt-lag(enddt) ne 1 then group+1;
run;
data want;
set temp;
by group;
retain start .;
if first.group then start=startdt;
if last.group;
format start date9.;
drop row startdt;
run;``````
Occasional Contributor
Posts: 5

## Re: How to summarize dataset by Min.and Max date value by Group

I understand this now. Thankyou!
Posts: 2,833

## Re: How to summarize dataset by Min.and Max date value by Group

[ Edited ]

If you create a flag to indicate where the breaks in time occur in your data set, then PROC SUMMARY will do the rest.

Something like (untested code)

```data want;
set have;
by id status;
prev_enddt=lag(enddt);
if first.status or prev_enddt^=(startdt-1) then flag+1;
run;

proc summary data = want nway;
class id status flag;
var startdt enddt;
output out=want2 min(startdt)=min_startdt max(enddt)=max_enddt;
run;```

--
Paige Miller
☑ This topic is solved.