DATA Step, Macro, Functions and more

Collapsing Dates across rows of observations with a twist.

Reply
New Contributor
Posts: 3

Collapsing Dates across rows of observations with a twist.

Hi everyone,

 

I have an input data that looks like this. The data has two subjects with ID1 and ID2

 

ID     Start_date      End_date

1       1/1/2010         4/1/2010

1       4/24/2010      6/1/2010

1      9/1/2010       10/15/2010 

2       3/1/2010       5/1/2010

2       6/1/2010      10/1/2010

 

Basically, I want to collapse the multiple start and end dates for each subject into one contiguous start and end date per each subject. However, there is a twist.  If the gap between the end date and next start date for the subject is greater than 45 days, then the dates are not collapsed for that row of observation.  I have highlighted the desired start and end dates for each subject.

 

The output dataset should look like

ID     Start_date      End_date

1       1/1/2010         6/1/2010

1      9/1/2010       10/15/2010

2       3/1/2010       10/1/2010

 

Note that subject 1 has a gap greater than 45 days (from 6/1/2010 to 9/1/2010), so the row from 9/1/2010  to 10/15/2010 is not collapsed with the first two rows for subject 1.

 

Subject 2 has no gap greater than 45 days, so his two rows of start and end dates are collapsed. 

 

Thank you so much for your help.

gary

 

Super User
Super User
Posts: 7,997

Re: Collapsing Dates across rows of observations with a twist.

Posted in reply to pegasus1225

Post test data in the form of a datastep!

As such I am not typing that in to test this, and I assume that is a date.

data want;
  set have;
  retain start_date_ end_date_;
  by id;
  if first.id or start_date_=. then start_date_=start_date;
  else do;
    if last.id or end_date-lag(end_date) > 45 then do;
      end_date_=end_date;
      output;
      start_date_=.;
      end_date_=.;
    end;
   end;
run;
      

Note the output variables - the want, is with a _ after.

Regular Contributor
Regular Contributor
Posts: 162

Re: Collapsing Dates across rows of observations with a twist.

@RW9 I'm not sure that quite works.

 

If I do this:

Data Input;
  input id start_date:mmddyy10. end_date:mmddyy10.;
  datalines;
1 1/1/2010 4/1/2010
1 4/24/2010 6/1/2010
1 9/1/2010 10/15/2010 
2 3/1/2010 5/1/2010
2 6/1/2010 10/1/2010
;
run;

data output;
  set input;
  retain start_date_ end_date_;
  by id;
  if first.id or start_date_=. then start_date_=start_date;
  else do;
    if last.id or end_date-lag(end_date) > 45 then do;
      end_date_=end_date;
      output;
      start_date_=.;
      end_date_=.;
    end;
   end;
run;

proc print data = output noobs;
	var id start_date_ end_date_;
	format start_date_ date9. end_date_ date9.;
run;

I get this:

 

                                       The SAS System      

                                           start_
                                  id        date_    end_date_

                                   1    01JAN2010    15OCT2010
                                   2    01MAR2010    01OCT2010

which I don't think is exactly what OP wants.  It seems to skip over the 6/1 to 9/1 break for ID 1 in which OP is interested. 

Super User
Posts: 5,518

Re: Collapsing Dates across rows of observations with a twist.

Posted in reply to pegasus1225

I think it would be easiest for you to follow the logic if you were to break this into two steps (even though it can be done in one).  Here is an example of the first step.  Assuming your data is already sorted by ID START_DATE:

 

data have2;

set have;

by id;

prior_end = lag(end_date);

if first.id or (start_date - prior_end) > 45 then grouping + 1;

run;

 

Then you can take a look at the result, and determine whether GROUPING matches your expectations.  If it does, continue with the second step:

 

data want;

set have2;

by id grouping;

if first.grouping then new_start = start_date;

retain new_start;

drop start_date;

rename new_start = start_date;

if last.grouping;

run;

 

Regular Contributor
Regular Contributor
Posts: 162

Re: Collapsing Dates across rows of observations with a twist.

[ Edited ]
Posted in reply to Astounding

Using the data from above:

 

data have2;
	set input;
	by id;
	prior_end = lag(end_date);
	if first.id or (start_date - prior_end) > 45 then grouping + 1;
run;

data want;
	set have2;
	by id grouping;
	if first.grouping then new_start = start_date;
	retain new_start;
	drop start_date;
	rename new_start = start_date;
	if last.grouping;
run;

proc print data = want noobs;
	var id start_date end_date;
	format start_date date9. end_date date9.;
run;

Seems to yield

                                         The SAS System 

                                           start_
                                  id         date     end_date

                                   1    01JAN2010    01JUN2010
                                   1    01SEP2010    15OCT2010
                                   2    01MAR2010    01OCT2010

 

 

I wish I understood either of the above approaches.  Lol. 

Super User
Posts: 5,518

Re: Collapsing Dates across rows of observations with a twist.

@HB,

 

To understand how this solution works, inspect the intermediate data set more carefully.  See if  you understand how GROUPING is being created.  (Note that it might in theory be the same for 3 or more observations, not just for two observations.)

 

Once you understand what is in GROUPING, think about how you could use GROUPING to create the final data set.  Then compare your thoughts to the second DATA part of the solution.

Regular Contributor
Regular Contributor
Posts: 162

Re: Collapsing Dates across rows of observations with a twist.

Posted in reply to Astounding

@Astounding

 

Left to my own devices, my approach would have been to restructure the data to look like

 

Data AlternateInput;
  input id mydate:mmddyy10. datetype;
  datalines;
1 1/1/2010 start
1 4/1/2010 end
1 4/24/2010 start
1 6/1/2010 end
1 9/1/2010 start
1 10/15/2010 end 
2 3/1/2010 start
2 5/1/2010 end
2 6/1/2010 start
2 10/1/2010 end
;

and then try to figure out how to walk the ID's to find the periods.  Probably best I'm not left to solve it myself. Ha.

Trusted Advisor
Posts: 1,022

Re: Collapsing Dates across rows of observations with a twist.

Posted in reply to pegasus1225

Here is a single DATA step processs that takes advantage of executing lag functions selectively, as well as a self-merge with FIRSTOBS=2:

 

data have;
 input ID     (Start_date      End_date) (:mmddyy10.);
 format start_date end_date date9.;
datalines;
1       1/1/2010         4/1/2010
1       4/24/2010      6/1/2010
1      9/1/2010       10/15/2010 
2       3/1/2010       5/1/2010
2       6/1/2010      10/1/2010
run;


data want (drop=next_:);

  merge have
        have (firstobs=2 keep=id start_date rename=(id=next_id start_date=next_start));

  if next_id^=id or next_start>end_date+45;

  start_date=lag(next_start);
  if start_date=. then set have (keep=start_date obs=1);
run;

 

 

Notes:

  1. The merge statement matches record N of HAVE with record N+1 (yielding new vars NEXT_ID and NEXT_START.
  2. The subsetting IF statement only allow records from the end of a qualified period - therefore providing the desired END_DATE
  3. The start_date=lag(next_date) only update the lag(next_date) queue when a complete period has been identified. 
  4. The lag(next_date) value is missing for the first desired period, so observation 1 is re-read only for that instance.
New Contributor
Posts: 3

Re: Collapsing Dates across rows of observations with a twist.

Posted in reply to pegasus1225

Thank you 

 

 

 

Ask a Question
Discussion stats
  • 8 replies
  • 122 views
  • 0 likes
  • 5 in conversation