BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
topryde
Fluorite | Level 6

Hi, I want to track changes of a person's status and find their minimum start and maximum end dates for each change. The raw datatable below is a sample where you can see for person A the minimum start and maximum end date for their Main status are 01Jan15 to 01Feb16. So for Row 1 and 2 there's no change in status and since EndDt for row 1 is 06July15 and StartDt for row 2 is 07July15 there's no break in the period, and the two rows should be aggregated into one row start on 01Jan15 and end on 01Feb16.

 

The 3rd row should be kept because there's a break in the period even though status stays in Main, StartDt is 01Oct16 and EndDt is 31Dec16.

 

The 4th and 5th row should be aggregated into one row that reads status in Co and StartDt is 01Jan17 and EndDt is 01Sep17.

 

The 6th row should be kept unchanged start from 30Oct17 to 05Nov17.

The desired reduced datatable is copied below the raw datatable

 

Raw datatable

RowIDStatusStartDtEndDt
1AMain01-Jan-1506-Jul-15
2AMain07-Jul-1501-Feb-16
3AMain01-Oct-1631-Dec-16
4ACo01-Jan-1707-Jul-17
5ACo08-Jul-1701-Sep-17
6ACo30-Oct-1705-Nov-17

Desired reduced datatable

IDStatusMinStartDtMaxEndDt
AMain01-Jan-1501-Feb-16
AMain01-Oct-1631-Dec-16
ACo01-Jan-1701-Sep-17
ACo30-Oct-17

05-Nov-17

This problem has been bugging me for days, I appreciate any suggestions I could get. Also I am new to this, so could you please give more details in your answers?

 

Many Thanks!

Michelle

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
infile cards expandtabs truncover;
input Row	ID $	Status $	StartDt : date9.	EndDt : date9.;
format StartDt EndDt date9.;
cards;
1	A	Main	01-Jan-15	06-Jul-15
2	A	Main	07-Jul-15	01-Feb-16
3	A	Main	01-Oct-16	31-Dec-16
4	A	Co	01-Jan-17	07-Jul-17
5	A	Co	08-Jul-17	01-Sep-17
6	A	Co	30-Oct-17	05-Nov-17
;
run;
data temp;
 set have;
 by id status notsorted;
 if first.status or startdt-lag(enddt) ne 1 then group+1;
run;
data want;
 set temp;
 by group;
 retain start .;
 if first.group then start=startdt;
 if last.group;
 format start date9.;
 drop row startdt;
run;

View solution in original post

4 REPLIES 4
Shmuel
Garnet | Level 18

1) sort dataset by is status startDT.

2) use retain and lag function to save previous obs values of startDT endDT.

3) on first.status keep the retained startDT value into a new variable.

4) on last.status override the current startDT value with the new retained startDT and do output.

    else (any obs neither first nor last) - check is startDT > retained endDT +1.

         if positive do:

              -  output

              - override retained startDT with the current startDT.

 

cretae your code and come back if you have any issue.

Ksharp
Super User
data have;
infile cards expandtabs truncover;
input Row	ID $	Status $	StartDt : date9.	EndDt : date9.;
format StartDt EndDt date9.;
cards;
1	A	Main	01-Jan-15	06-Jul-15
2	A	Main	07-Jul-15	01-Feb-16
3	A	Main	01-Oct-16	31-Dec-16
4	A	Co	01-Jan-17	07-Jul-17
5	A	Co	08-Jul-17	01-Sep-17
6	A	Co	30-Oct-17	05-Nov-17
;
run;
data temp;
 set have;
 by id status notsorted;
 if first.status or startdt-lag(enddt) ne 1 then group+1;
run;
data want;
 set temp;
 by group;
 retain start .;
 if first.group then start=startdt;
 if last.group;
 format start date9.;
 drop row startdt;
run;
topryde
Fluorite | Level 6
I understand this now. Thankyou!
PaigeMiller
Diamond | Level 26

If you create a flag to indicate where the breaks in time occur in your data set, then PROC SUMMARY will do the rest.

 

Something like (untested code)

 

data want;
    set have;
    by id status;
    prev_enddt=lag(enddt);
    if first.status or prev_enddt^=(startdt-1) then flag+1;
run;

proc summary data = want nway;
    class id status flag;
    var startdt enddt;
    output out=want2 min(startdt)=min_startdt max(enddt)=max_enddt;
run;

 

 

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1404 views
  • 1 like
  • 4 in conversation