Solved: Re: how to check date overlap: last visit's end date is later than thi...

fengyuwuzu · Posted 01-25-2017 10:50 AM

For example, in the following data, for ID A002, visit 1's end date is 3/25/2016, but visit 2's start date is 3/24/2016.

How to identify such data issues?

ID	visitnum	start_date	end_date
A001	1	8/26/2015	9/3/2015
	2	9/17/2015	9/17/2017
	3	10/2/2015	10/2/2015
	4	10/30/2015	10/30/2015
	5	11/24/2015	11/24/2015
A002	1	3/17/2016	3/25/2016
	2	3/24/2016	3/24/2016
	3	3/31/2016	3/31/2016
	4	4/28/2016	4/28/2016
	5	5/25/2016	5/25/2016

nex

collinelliot · Posted 01-25-2017 10:57 AM

One way is to use a retain to carry the previous end date to the next observation and then flag it. Something like the below:

data have;
    input ID $ visitnum	start_date :mmddyy10. end_date :mmddyy10.;
    datalines;
A001 1 8/26/2015 9/3/2015
A001 2 9/17/2015 9/17/2017
A001 3 10/2/2015 10/2/2015
A001 4 10/30/2015 10/30/2015
A001 5 11/24/2015 11/24/2015
A002 1 3/17/2016 3/25/2016
A002 2 3/24/2016 3/24/2016
A002 3 3/31/2016 3/31/2016
A002 4 4/28/2016 4/28/2016
A002 5 5/25/2016 5/25/2016
;

data want;  
    retain _lst;
    set have;
    by id visitnum;
    if first.id then do;
        flag = 0;
        _lst = end_date;
    end;
    else do;
        flag = (_lst > end_date);
        _lst = end_date;
    end;
    format start_date end_date mmddyy10.;
run;

View solution in original post

collinelliot · Posted 01-25-2017 10:57 AM

One way is to use a retain to carry the previous end date to the next observation and then flag it. Something like the below:

data have;
    input ID $ visitnum	start_date :mmddyy10. end_date :mmddyy10.;
    datalines;
A001 1 8/26/2015 9/3/2015
A001 2 9/17/2015 9/17/2017
A001 3 10/2/2015 10/2/2015
A001 4 10/30/2015 10/30/2015
A001 5 11/24/2015 11/24/2015
A002 1 3/17/2016 3/25/2016
A002 2 3/24/2016 3/24/2016
A002 3 3/31/2016 3/31/2016
A002 4 4/28/2016 4/28/2016
A002 5 5/25/2016 5/25/2016
;

data want;  
    retain _lst;
    set have;
    by id visitnum;
    if first.id then do;
        flag = 0;
        _lst = end_date;
    end;
    else do;
        flag = (_lst > end_date);
        _lst = end_date;
    end;
    format start_date end_date mmddyy10.;
run;

fengyuwuzu · Posted 01-25-2017 11:22 AM

Thanks, Collinelliot.

But somehow there is a wrong flag when I ran your code. see red arrow below.

collinelliot · Posted 01-25-2017 11:25 AM

The previous date is in 2017, which is later than the next date, so the flag seems correct. Unless I don't understand what you want.

fengyuwuzu · Posted 01-25-2017 11:27 AM

you are right. It was a typo (2017 should be 2015). Your code works! Thank you

collinelliot · Posted 01-25-2017 11:28 AM

You're welcome!

ballardw · Posted 01-25-2017 10:59 AM

First would be ensure that the values are SAS date values and not character. Is that the case?

fengyuwuzu · Posted 01-25-2017 11:18 AM

My dates are in yyyy-mm-dd format in the real SAS dateset, which can be compared differectly either in numeric format or character format. I did not pay attention when typing the example dates in excel.

ballardw · Posted 01-26-2017 10:35 AM

@fengyuwuzu wrote:
My dates are in yyyy-mm-dd format in the real SAS dateset, which can be compared differectly either in numeric format or character format. I did not pay attention when typing the example dates in excel.

So you show example data that is not in the form of the actual data?

And while comparisons of greater than or less than may work things like "by how much to they overlap" or any kind of actual difference cannot be done with either of those values but can be done with SAS date values.

mkeintz · Posted 01-25-2017 11:35 AM

Both of these are untested.

If you only want to flag the second record of the overlap then a lag function and a by statemetn are the needed tools.

data want;  
    set have;
    by id ;
    if (first.id=0 and start_date<lag(end_date)) then flag=1;
    else flag=0;
run;

It's a few more lines if you want to flag both the leading and trailing elements of the overlap. In this case, flag will be the sequence of the record in a consecutive sequence of overlaps (i.e. record K overlaps K+1 which overlaps K+3, yielding flag=1 then 2 then 3):

data want;
  set have (keep=id);
  by id;
  merge have
        have (firstobs=2 keep=start_date rename=(start_date=next_sd));
  if (first.id=0 and start_date<lag(end_date)) or 
      (end_date>next_sd and last.id=0) then flag+1;
  else flag=0;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Re: how to check date overlap: last visit's end date is later than this visit's start date

Classroom Training Available!