Obsidian | Level 7

## Deleting rows if dates are a certain time apart within an ID.

Hello,

I need to filter repeated measures if the dates (by id) are not 60 days apart. There are hundreds of id's in the dataset; I am showing three here.

Person 1 can be reduced to a single date (because they're the same). Person 2 can be reduced to a single date (I figure choosing the earliest date makes the most sense). Person 3 has 2 dates that are far enough apart to keep, but a third that would cause that row to be dropped (again wanting to keep the earliest date).  Dates can be repeated and a person can have any number of dates from 1-10 (although no one has 7-9 oddly enough).

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;

1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016

3 3/20/2016
3 9/6/2016
;
RUN;

DATA want;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;

1 2/20/2016
2 1/25/2016
3 3/18/2016
3 9/6/2016

;
RUN;

Here's my logic for comparing dates, and an example of a person with three sorted dates:

Case1:

If date2-date1 <= 60, keep date1 and delete date2       For the next comparison date1 would be used again.

If date3-date1 <=60, keep date1 delete date3

====================================

Case2:

If date2-date1 >= 60, keep both dates(rows)                     At this point, when keeping both dates, the next comparison must be done using the latest date.

If date3-date2 >=60, keep both dates              In this situation all three dates are retained. But had date3-date2 been <= 60 I would've dropped date3.

Any help is appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

## Re: Deleting rows if dates are a certain time apart within an ID.

This is a case where you have to look ahead one observation to see if the upcoming gap is more than 60 days (or if the next observation starts a new id).   You can do this via a MERGE statement where you merge observation1 with observation2 (with ID and DATE variables renamed), and then merge observation 2 with observation 3, etc.  That is supported via the "FIRSTOBS=2" option in one of the objects of the MERGE statement below:

``````

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;
1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016
3 3/20/2016
3 9/6/2016
;
RUN;

data want (drop=_:);
do until (_nxt_id^=id or date+60<_nxt_date);
merge have
have (firstobs=2 keep=id date rename=(id=_nxt_id date=_nxt_date));
if _init_date=. then _init_date=date;
end;
date=_init_date;
run;``````
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
2 REPLIES 2
PROC Star

## Re: Deleting rows if dates are a certain time apart within an ID.

This is a case where you have to look ahead one observation to see if the upcoming gap is more than 60 days (or if the next observation starts a new id).   You can do this via a MERGE statement where you merge observation1 with observation2 (with ID and DATE variables renamed), and then merge observation 2 with observation 3, etc.  That is supported via the "FIRSTOBS=2" option in one of the objects of the MERGE statement below:

``````

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;
1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016
3 3/20/2016
3 9/6/2016
;
RUN;

data want (drop=_:);
do until (_nxt_id^=id or date+60<_nxt_date);
merge have
have (firstobs=2 keep=id date rename=(id=_nxt_id date=_nxt_date));
if _init_date=. then _init_date=date;
end;
date=_init_date;
run;``````
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

## Re: Deleting rows if dates are a certain time apart within an ID.

Thanks again for the help @mkeintz
Discussion stats
• 2 replies
• 915 views
• 1 like
• 2 in conversation