For example, in the following data, for ID A002, visit 1's end date is 3/25/2016, but visit 2's start date is 3/24/2016.
How to identify such data issues?
ID | visitnum | start_date | end_date |
A001 | 1 | 8/26/2015 | 9/3/2015 |
2 | 9/17/2015 | 9/17/2017 | |
3 | 10/2/2015 | 10/2/2015 | |
4 | 10/30/2015 | 10/30/2015 | |
5 | 11/24/2015 | 11/24/2015 | |
A002 | 1 | 3/17/2016 | 3/25/2016 |
2 | 3/24/2016 | 3/24/2016 | |
3 | 3/31/2016 | 3/31/2016 | |
4 | 4/28/2016 | 4/28/2016 | |
5 | 5/25/2016 | 5/25/2016 |
nex
One way is to use a retain to carry the previous end date to the next observation and then flag it. Something like the below:
data have;
input ID $ visitnum start_date :mmddyy10. end_date :mmddyy10.;
datalines;
A001 1 8/26/2015 9/3/2015
A001 2 9/17/2015 9/17/2017
A001 3 10/2/2015 10/2/2015
A001 4 10/30/2015 10/30/2015
A001 5 11/24/2015 11/24/2015
A002 1 3/17/2016 3/25/2016
A002 2 3/24/2016 3/24/2016
A002 3 3/31/2016 3/31/2016
A002 4 4/28/2016 4/28/2016
A002 5 5/25/2016 5/25/2016
;
data want;
retain _lst;
set have;
by id visitnum;
if first.id then do;
flag = 0;
_lst = end_date;
end;
else do;
flag = (_lst > end_date);
_lst = end_date;
end;
format start_date end_date mmddyy10.;
run;
One way is to use a retain to carry the previous end date to the next observation and then flag it. Something like the below:
data have;
input ID $ visitnum start_date :mmddyy10. end_date :mmddyy10.;
datalines;
A001 1 8/26/2015 9/3/2015
A001 2 9/17/2015 9/17/2017
A001 3 10/2/2015 10/2/2015
A001 4 10/30/2015 10/30/2015
A001 5 11/24/2015 11/24/2015
A002 1 3/17/2016 3/25/2016
A002 2 3/24/2016 3/24/2016
A002 3 3/31/2016 3/31/2016
A002 4 4/28/2016 4/28/2016
A002 5 5/25/2016 5/25/2016
;
data want;
retain _lst;
set have;
by id visitnum;
if first.id then do;
flag = 0;
_lst = end_date;
end;
else do;
flag = (_lst > end_date);
_lst = end_date;
end;
format start_date end_date mmddyy10.;
run;
Thanks, Collinelliot.
But somehow there is a wrong flag when I ran your code. see red arrow below.
The previous date is in 2017, which is later than the next date, so the flag seems correct. Unless I don't understand what you want.
you are right. It was a typo (2017 should be 2015). Your code works! Thank you
You're welcome!
First would be ensure that the values are SAS date values and not character. Is that the case?
@fengyuwuzu wrote:
My dates are in yyyy-mm-dd format in the real SAS dateset, which can be compared differectly either in numeric format or character format. I did not pay attention when typing the example dates in excel.
So you show example data that is not in the form of the actual data?
And while comparisons of greater than or less than may work things like "by how much to they overlap" or any kind of actual difference cannot be done with either of those values but can be done with SAS date values.
Both of these are untested.
If you only want to flag the second record of the overlap then a lag function and a by statemetn are the needed tools.
data want;
set have;
by id ;
if (first.id=0 and start_date<lag(end_date)) then flag=1;
else flag=0;
run;
It's a few more lines if you want to flag both the leading and trailing elements of the overlap. In this case, flag will be the sequence of the record in a consecutive sequence of overlaps (i.e. record K overlaps K+1 which overlaps K+3, yielding flag=1 then 2 then 3):
data want;
set have (keep=id);
by id;
merge have
have (firstobs=2 keep=start_date rename=(start_date=next_sd));
if (first.id=0 and start_date<lag(end_date)) or
(end_date>next_sd and last.id=0) then flag+1;
else flag=0;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.