BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Manhort
Obsidian | Level 7

Hello,

 

I need to filter repeated measures if the dates (by id) are not 60 days apart. There are hundreds of id's in the dataset; I am showing three here.

Person 1 can be reduced to a single date (because they're the same). Person 2 can be reduced to a single date (I figure choosing the earliest date makes the most sense). Person 3 has 2 dates that are far enough apart to keep, but a third that would cause that row to be dropped (again wanting to keep the earliest date).  Dates can be repeated and a person can have any number of dates from 1-10 (although no one has 7-9 oddly enough).

 

 

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;

1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016

3 3/20/2016
3 9/6/2016
;
RUN;

 

 

DATA want;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;

1 2/20/2016
2 1/25/2016
3 3/18/2016
3 9/6/2016

;
RUN;

 

 

 

Here's my logic for comparing dates, and an example of a person with three sorted dates:

 

Case1:

If date2-date1 <= 60, keep date1 and delete date2       For the next comparison date1 would be used again.

If date3-date1 <=60, keep date1 delete date3

====================================

 

Case2:

If date2-date1 >= 60, keep both dates(rows)                     At this point, when keeping both dates, the next comparison must be done using the latest date.

If date3-date2 >=60, keep both dates              In this situation all three dates are retained. But had date3-date2 been <= 60 I would've dropped date3.

 

 

Any help is appreciated.

 



 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a case where you have to look ahead one observation to see if the upcoming gap is more than 60 days (or if the next observation starts a new id).   You can do this via a MERGE statement where you merge observation1 with observation2 (with ID and DATE variables renamed), and then merge observation 2 with observation 3, etc.  That is supported via the "FIRSTOBS=2" option in one of the objects of the MERGE statement below:

 

 

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;
1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016
3 3/20/2016
3 9/6/2016
;
RUN;

data want (drop=_:);
  do until (_nxt_id^=id or date+60<_nxt_date);
    merge have
          have (firstobs=2 keep=id date rename=(id=_nxt_id date=_nxt_date));
    if _init_date=. then _init_date=date;
  end;
  date=_init_date;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

This is a case where you have to look ahead one observation to see if the upcoming gap is more than 60 days (or if the next observation starts a new id).   You can do this via a MERGE statement where you merge observation1 with observation2 (with ID and DATE variables renamed), and then merge observation 2 with observation 3, etc.  That is supported via the "FIRSTOBS=2" option in one of the objects of the MERGE statement below:

 

 

DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;
1 2/20/2016
1 2/20/2016
2 1/25/2016
2 1/28/2016
2 2/4/2016
2 2/4/2016
3 3/18/2016
3 3/20/2016
3 9/6/2016
;
RUN;

data want (drop=_:);
  do until (_nxt_id^=id or date+60<_nxt_date);
    merge have
          have (firstobs=2 keep=id date rename=(id=_nxt_id date=_nxt_date));
    if _init_date=. then _init_date=date;
  end;
  date=_init_date;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Manhort
Obsidian | Level 7
Thanks again for the help @mkeintz

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1498 views
  • 1 like
  • 2 in conversation