Hi everyone,
I have data here with 3 ids. I need to create a new variable I'm calling 'clusterID' that goes through the original ids and creates new clusters based on the dates. Date groupings that are 60 days apart should be made a new cluster and counted off. In this example there will be a total of 6 clusterIDs. Person 100 has 1, person 161 has 2, and person 222 has 3. The data is messy and dates are repeated (there are other variables I can't just abandon).
DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;
100 09/08/2016
100 09/08/2016
100 09/12/2016
161 06/14/2017
161 06/14/2017
161 06/19/2017
161 06/19/2017
161 08/14/2019
161 08/14/2019
161 08/22/2019
222 01/12/2016
222 05/10/2016
222 05/10/2016
222 05/10/2016
222 05/10/2016
222 03/03/2017
;
RUN;
DATA want;
input id date mmddyy10. clusterid;
format date mmddyy10.;
CARDS;
100 09/08/2016 1
100 09/08/2016 1
100 09/12/2016 1
161 06/14/2017 2
161 06/14/2017 2
161 06/19/2017 2
161 06/19/2017 2
161 08/14/2019 3
161 08/14/2019 3
161 08/22/2019 3
222 01/12/2016 4
222 05/10/2016 5
222 05/10/2016 5
222 05/10/2016 5
222 05/10/2016 5
222 03/03/2017 6
;
RUN;
What I've Tried:
Previously, someone helped me in another situation and gave me code that will separate the dates by 60days and keep just the first date, this is shown in the 'oldwant' table below. I'm unsure if this helps, but it does give start dates for clusters.
data oldwant (drop=_:);
do until (_nxt_id^=id or date+60<_nxt_date);
merge have
have (firstobs=2 keep=id date rename=(id=_nxt_id date=_nxt_date));
if _init_date=. then _init_date=date;
end;
date=_init_date;
run;
Any help is appreciated thanks.
Like this?
data WANT;
set HAVE;
by ID DATE;
if ID=lag(ID) and DATE > lag(DATE)+60 then GROUP+1;
else if first.ID then GROUP+1;
run;
ID | DATE | GROUP |
---|---|---|
100 | 08SEP2016 | 1 |
100 | 08SEP2016 | 1 |
100 | 12SEP2016 | 1 |
161 | 14JUN2017 | 2 |
161 | 14JUN2017 | 2 |
161 | 19JUN2017 | 2 |
161 | 19JUN2017 | 2 |
161 | 14AUG2019 | 3 |
161 | 14AUG2019 | 3 |
161 | 22AUG2019 | 3 |
222 | 12JAN2016 | 4 |
222 | 10MAY2016 | 5 |
222 | 10MAY2016 | 5 |
222 | 10MAY2016 | 5 |
222 | 10MAY2016 | 5 |
222 | 03MAR2017 | 6 |
This creates your want dataset:
data want;
set have;
by id;
if first.id or dif(date) > 60 then clusterid + 1;
run;
DATA have;
input id date mmddyy10.;
format date mmddyy10.;
CARDS;
100 09/08/2016
100 09/08/2016
100 09/12/2016
161 06/14/2017
161 06/14/2017
161 06/19/2017
161 06/19/2017
161 08/14/2019
161 08/14/2019
161 08/22/2019
222 01/12/2016
222 05/10/2016
222 05/10/2016
222 05/10/2016
222 05/10/2016
222 03/03/2017
;
RUN;
data want;
set have;
by id;
retain start;
if first.id or date-start>60 then do;start=date;clusterid+1;end;
drop start;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.