Sorting overlapping dates and address histories

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Sorting overlapping dates and address histories

Hi all,

 

I'm currently trying to figure out how to create a dataset that has multiple rows per user, based on their address history. I have a dataset that has multiple rows per person with unique addresses, but currently their dates overlap. I'd like to clear the overlaps by giving start dates priority. The answers I've found are pretty close, but I can't seem to drop the overlapping end dates.

 

What I have:

ID    DateStart  DateEnd Address 

1      05/2000    09/2005   XXXXX

1      06/2003    08/2010   YYY

1      02/2006    03/2008   GGG

1      08/2011    05/2016   TTT

....

What I'd like to have:

ID    DateStart  DateEnd Address 

1      05/2000    06/2003   XXXXX

1      06/2003    02/2006   YYY

1      02/2006    03/2008   GGG

1      08/2011    05/2016    TTT

 

I've tried to include the major date range overlap types, so in the second row, the new address occurs before the end of the first row. The third row shows one that is inside of the second row, but I don't want the remainder of the second row, just an end when the third row ends, and lastly the fourth row shows a gap type date range, which is fine.

 

 

Thanks for all your help!

 

 


Accepted Solutions
Solution
‎11-30-2016 05:10 PM
Respected Advisor
Posts: 3,837

Re: Sorting overlapping dates and address histories

1. Sort by ID and descending start date.

2. retain the start date from the previous record

3. set end date as min(retained start date, end date)

 

View solution in original post


All Replies
Solution
‎11-30-2016 05:10 PM
Respected Advisor
Posts: 3,837

Re: Sorting overlapping dates and address histories

1. Sort by ID and descending start date.

2. retain the start date from the previous record

3. set end date as min(retained start date, end date)

 

Occasional Contributor
Posts: 5

Re: Sorting overlapping dates and address histories

[ Edited ]

Hi Patrick,

 

Thanks for the suggestion, got it working!

Contributor
Posts: 22

Re: Sorting overlapping dates and address histories

You can use the lag function like the code below. I didn't know if your dates were in character or were numeric with a date format of 05/2000. So, I wrote it as if it was character. My output doesn't have the leading zero on the month though, so you will have to change that. If you can find the right format in SAS, you can do put(date,fmt.) to get the output you want.

My copy/paste from SAS really screwed up the formatting - sorry.

 

data have;

    input ID DateStart $ DateEnd $ Address $;

    lines;

      1 05/2000 09/2005 XXXXX

      1 06/2003 08/2010 YYY

      1 02/2006 03/2008 GGG

      1 08/2011 05/2016 TTT

      2 04/2001 08/2003 AAA

      2 08/2003 11/2004 BBB

      2 11/2004 03/2006 CCC

      ;

run;

 

/* If input data is character. Can skip if input data is numeric with date format */

data have2;

   set have;

   StartDate = mdy(substr(DateStart,1,2),1,substr(DateStart,4,4));

   EndDate = mdy(substr(DateEnd,1,2),1,substr(DateEnd,4,4));

   format StartDate EndDate mmddyy10.;

run;

 

proc sort data = have2;

   by ID StartDate;

run;

 

data want;

   set have2;

   by id;

      LastID = lag1(ID);

      LastStartDate = lag1(StartDate);

      LastEndDate = lag1(EndDate);

      LastAddress = lag1(Address);

   if ^first.id then do;

           KeepStartDate = LastStartDate;

           KeepAddress = LastAddress;

     if LastEndDate < StartDate then do;

               KeepEndDate = LastEndDate;

     end;

     else do;

               KeepEndDate = StartDate;

     end;

     output;

   end;

   if last.id then do;

           KeepStartDate = StartDate;

           KeepEndDate = EndDate;

           KeepAddress = Address;

   output;

  end;

  format LastStartDate LastEndDate KeepStartDate KeepEndDate mmddyy10.;

run;

 

/* if output needs to be character, not numeric with date format */

data want2;

   set want (keep = ID KeepStartDate KeepEndDate KeepAddress);

   StartDate = cats(month(KeepStartDate),'/', year(KeepStartDate));

   EndDate = cats(month(KeepEndDate),'/', year(KeepEndDate));

      Address = KeepAddress;

   keep ID StartDate EndDate Address;

run;

 

proc print data = have;

run;

 

proc print data = have2;

run;

 

proc print data = want;

run;

 

proc print data = want2;

run;

Contributor
Posts: 22

Re: Sorting overlapping dates and address histories

I was going to use min, but for some reason, I thought sometimes you would like the previous address as well. But, in the end, you didn't. Also, be careful that you are matching up the IDs when you do the minimum.

Respected Advisor
Posts: 4,997

Re: Sorting overlapping dates and address histories

Patrick's approach is good, subject to one question.

 

What is actually contained in your date variables?  Are they character strings (what you see is what you get)?  Are they SAS dates with an unusual format to them?

Occasional Contributor
Posts: 5

Re: Sorting overlapping dates and address histories

Hi,
They're in SAS date formats, just yymmn6. format.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 270 views
  • 1 like
  • 4 in conversation