Solved: Deleting rows if dates are a certain time apart within an ID.

Manhort · Posted 03-24-2022 06:50 PM

Hello,

I need to filter repeated measures if the dates (by id) are not 60 days apart. There are hundreds of id's in the dataset; I am showing three here.

Person 1 can be reduced to a single date (because they're the same). Person 2 can be reduced to a single date (I figure choosing the earliest date makes the most sense). Person 3 has 2 dates that are far enough apart to keep, but a third that would cause that row to be dropped (again wanting to keep the earliest date). Dates can be repeated and a person can have any number of dates from 1-10 (although no one has 7-9 oddly enough).

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;

1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016

3 3/20/2016
3 9/6/2016
;
RUN;

DATA want;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;

1 2/20/2016
2 1/25/2016
3 3/18/2016
3 9/6/2016

;
RUN;

Here's my logic for comparing dates, and an example of a person with three sorted dates:

Case1:

If date2-date1 <= 60, keep date1 and delete date2 For the next comparison date1 would be used again.

If date3-date1 <=60, keep date1 delete date3

====================================

Case2:

If date2-date1 >= 60, keep both dates(rows) At this point, when keeping both dates, the next comparison must be done using the latest date.

If date3-date2 >=60, keep both dates In this situation all three dates are retained. But had date3-date2 been <= 60 I would've dropped date3.

Any help is appreciated.

mkeintz · Posted 03-24-2022 08:17 PM

This is a case where you have to look ahead one observation to see if the upcoming gap is more than 60 days (or if the next observation starts a new id). You can do this via a MERGE statement where you merge observation1 with observation2 (with ID and DATE variables renamed), and then merge observation 2 with observation 3, etc. That is supported via the "FIRSTOBS=2" option in one of the objects of the MERGE statement below:

 

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;
1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016
3 3/20/2016
3 9/6/2016
;
RUN;

data want (drop=_:);
  do until (_nxt_id^=id or date+60<_nxt_date);
    merge have
          have (firstobs=2 keep=id date rename=(id=_nxt_id date=_nxt_date));
    if _init_date=. then _init_date=date;
  end;
  date=_init_date;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 03-24-2022 08:17 PM

This is a case where you have to look ahead one observation to see if the upcoming gap is more than 60 days (or if the next observation starts a new id). You can do this via a MERGE statement where you merge observation1 with observation2 (with ID and DATE variables renamed), and then merge observation 2 with observation 3, etc. That is supported via the "FIRSTOBS=2" option in one of the objects of the MERGE statement below:

 

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;
1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016
3 3/20/2016
3 9/6/2016
;
RUN;

data want (drop=_:);
  do until (_nxt_id^=id or date+60<_nxt_date);
    merge have
          have (firstobs=2 keep=id date rename=(id=_nxt_id date=_nxt_date));
    if _init_date=. then _init_date=date;
  end;
  date=_init_date;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Manhort · Posted 03-24-2022 08:35 PM

Thanks again for the help @mkeintz

Deleting rows if dates are a certain time apart within an ID.

Re: Deleting rows if dates are a certain time apart within an ID.

Re: Deleting rows if dates are a certain time apart within an ID.

Re: Deleting rows if dates are a certain time apart within an ID.

Catch up on SAS Innovate 2026

Deleting rows if dates are a certain time apart within an ID.

Re: Deleting rows if dates are a certain time apart within an ID.

Re: Deleting rows if dates are a certain time apart within an ID.

Re: Deleting rows if dates are a certain time apart within an ID.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away