BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fengyuwuzu
Pyrite | Level 9

For example, in the following data, for ID A002, visit 1's end date is 3/25/2016, but visit 2's start date is 3/24/2016.

How to identify such data issues? 

 

ID visitnum start_date end_date
A001 1 8/26/2015 9/3/2015
  2 9/17/2015 9/17/2017
  3 10/2/2015 10/2/2015
  4 10/30/2015 10/30/2015
  5 11/24/2015 11/24/2015
A002 1 3/17/2016 3/25/2016
  2 3/24/2016 3/24/2016
  3 3/31/2016 3/31/2016
  4 4/28/2016 4/28/2016
  5 5/25/2016 5/25/2016

nex

1 ACCEPTED SOLUTION

Accepted Solutions
collinelliot
Barite | Level 11

One way is to use a retain to carry the previous end date to the next observation and then flag it. Something like the below:

 

data have;
    input ID $ visitnum	start_date :mmddyy10. end_date :mmddyy10.;
    datalines;
A001 1 8/26/2015 9/3/2015
A001 2 9/17/2015 9/17/2017
A001 3 10/2/2015 10/2/2015
A001 4 10/30/2015 10/30/2015
A001 5 11/24/2015 11/24/2015
A002 1 3/17/2016 3/25/2016
A002 2 3/24/2016 3/24/2016
A002 3 3/31/2016 3/31/2016
A002 4 4/28/2016 4/28/2016
A002 5 5/25/2016 5/25/2016
;

data want;  
    retain _lst;
    set have;
    by id visitnum;
    if first.id then do;
        flag = 0;
        _lst = end_date;
    end;
    else do;
        flag = (_lst > end_date);
        _lst = end_date;
    end;
    format start_date end_date mmddyy10.;
run;

View solution in original post

9 REPLIES 9
collinelliot
Barite | Level 11

One way is to use a retain to carry the previous end date to the next observation and then flag it. Something like the below:

 

data have;
    input ID $ visitnum	start_date :mmddyy10. end_date :mmddyy10.;
    datalines;
A001 1 8/26/2015 9/3/2015
A001 2 9/17/2015 9/17/2017
A001 3 10/2/2015 10/2/2015
A001 4 10/30/2015 10/30/2015
A001 5 11/24/2015 11/24/2015
A002 1 3/17/2016 3/25/2016
A002 2 3/24/2016 3/24/2016
A002 3 3/31/2016 3/31/2016
A002 4 4/28/2016 4/28/2016
A002 5 5/25/2016 5/25/2016
;

data want;  
    retain _lst;
    set have;
    by id visitnum;
    if first.id then do;
        flag = 0;
        _lst = end_date;
    end;
    else do;
        flag = (_lst > end_date);
        _lst = end_date;
    end;
    format start_date end_date mmddyy10.;
run;
fengyuwuzu
Pyrite | Level 9

Thanks, Collinelliot. 

But somehow there is a wrong flag when I ran your code. see red arrow below. 

 

Capture.PNG

collinelliot
Barite | Level 11

The previous date is in 2017, which is later than the next date, so the flag seems correct. Unless I don't understand what you want.

fengyuwuzu
Pyrite | Level 9

you are right. It was a typo (2017 should be 2015). Your code works! Thank you

ballardw
Super User

First would be ensure that the values are SAS date values and not character. Is that the case?

fengyuwuzu
Pyrite | Level 9
My dates are in yyyy-mm-dd format in the real SAS dateset, which can be compared differectly either in numeric format or character format. I did not pay attention when typing the example dates in excel.
ballardw
Super User

@fengyuwuzu wrote:
My dates are in yyyy-mm-dd format in the real SAS dateset, which can be compared differectly either in numeric format or character format. I did not pay attention when typing the example dates in excel.

So you show example data that is not in the form of the actual data?

And while comparisons of greater than or less than may work things like "by how much to they overlap" or any kind of actual difference cannot be done with either of those values but can be done with SAS date values.

mkeintz
PROC Star

Both of these are untested.

 

If you only want to flag the second record of the overlap then a lag function and a by statemetn are the needed tools.

 

data want;  
    set have;
    by id ;
    if (first.id=0 and start_date<lag(end_date)) then flag=1;
    else flag=0;
run;

 

 

It's a few more lines if you want to flag both the leading and trailing elements of the overlap.  In this case, flag will be the sequence of the record in a consecutive sequence of overlaps  (i.e. record K overlaps K+1 which overlaps K+3, yielding flag=1 then 2 then 3):

 

data want;
  set have (keep=id);
  by id;
  merge have
        have (firstobs=2 keep=start_date rename=(start_date=next_sd));
  if (first.id=0 and start_date<lag(end_date)) or 
      (end_date>next_sd and last.id=0) then flag+1;
  else flag=0;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 8956 views
  • 4 likes
  • 4 in conversation