Hi, I want to track changes of a person's status and find their minimum start and maximum end dates for each change. The raw datatable below is a sample where you can see for person A the minimum start and maximum end date for their Main status are 01Jan15 to 01Feb16. So for Row 1 and 2 there's no change in status and since EndDt for row 1 is 06July15 and StartDt for row 2 is 07July15 there's no break in the period, and the two rows should be aggregated into one row start on 01Jan15 and end on 01Feb16.
The 3rd row should be kept because there's a break in the period even though status stays in Main, StartDt is 01Oct16 and EndDt is 31Dec16.
The 4th and 5th row should be aggregated into one row that reads status in Co and StartDt is 01Jan17 and EndDt is 01Sep17.
The 6th row should be kept unchanged start from 30Oct17 to 05Nov17.
The desired reduced datatable is copied below the raw datatable
Raw datatable
Row | ID | Status | StartDt | EndDt |
1 | A | Main | 01-Jan-15 | 06-Jul-15 |
2 | A | Main | 07-Jul-15 | 01-Feb-16 |
3 | A | Main | 01-Oct-16 | 31-Dec-16 |
4 | A | Co | 01-Jan-17 | 07-Jul-17 |
5 | A | Co | 08-Jul-17 | 01-Sep-17 |
6 | A | Co | 30-Oct-17 | 05-Nov-17 |
Desired reduced datatable
ID | Status | MinStartDt | MaxEndDt |
A | Main | 01-Jan-15 | 01-Feb-16 |
A | Main | 01-Oct-16 | 31-Dec-16 |
A | Co | 01-Jan-17 | 01-Sep-17 |
A | Co | 30-Oct-17 | 05-Nov-17 |
This problem has been bugging me for days, I appreciate any suggestions I could get. Also I am new to this, so could you please give more details in your answers?
Many Thanks!
Michelle
data have;
infile cards expandtabs truncover;
input Row ID $ Status $ StartDt : date9. EndDt : date9.;
format StartDt EndDt date9.;
cards;
1 A Main 01-Jan-15 06-Jul-15
2 A Main 07-Jul-15 01-Feb-16
3 A Main 01-Oct-16 31-Dec-16
4 A Co 01-Jan-17 07-Jul-17
5 A Co 08-Jul-17 01-Sep-17
6 A Co 30-Oct-17 05-Nov-17
;
run;
data temp;
set have;
by id status notsorted;
if first.status or startdt-lag(enddt) ne 1 then group+1;
run;
data want;
set temp;
by group;
retain start .;
if first.group then start=startdt;
if last.group;
format start date9.;
drop row startdt;
run;
1) sort dataset by is status startDT.
2) use retain and lag function to save previous obs values of startDT endDT.
3) on first.status keep the retained startDT value into a new variable.
4) on last.status override the current startDT value with the new retained startDT and do output.
else (any obs neither first nor last) - check is startDT > retained endDT +1.
if positive do:
- output
- override retained startDT with the current startDT.
cretae your code and come back if you have any issue.
data have;
infile cards expandtabs truncover;
input Row ID $ Status $ StartDt : date9. EndDt : date9.;
format StartDt EndDt date9.;
cards;
1 A Main 01-Jan-15 06-Jul-15
2 A Main 07-Jul-15 01-Feb-16
3 A Main 01-Oct-16 31-Dec-16
4 A Co 01-Jan-17 07-Jul-17
5 A Co 08-Jul-17 01-Sep-17
6 A Co 30-Oct-17 05-Nov-17
;
run;
data temp;
set have;
by id status notsorted;
if first.status or startdt-lag(enddt) ne 1 then group+1;
run;
data want;
set temp;
by group;
retain start .;
if first.group then start=startdt;
if last.group;
format start date9.;
drop row startdt;
run;
If you create a flag to indicate where the breaks in time occur in your data set, then PROC SUMMARY will do the rest.
Something like (untested code)
data want; set have; by id status; prev_enddt=lag(enddt); if first.status or prev_enddt^=(startdt-1) then flag+1; run; proc summary data = want nway; class id status flag; var startdt enddt; output out=want2 min(startdt)=min_startdt max(enddt)=max_enddt; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.