I have a large data set with this structure:
ID | DATE |
16602 | 07/20/2015 |
16602 | 07/25/2015 |
16602 | 07/28/2015 |
20302 | 03/16/2016 |
20302 | 03/18/2016 |
20302 | 03/25/2016 |
20302 | 02/18/2015 |
ID | DATE | CLUSTER |
16602 | 07/20/2015 | 1 |
16602 | 07/25/2015 | 1 |
16602 | 07/28/2015 | 2 |
20302 | 03/16/2016 | 3 |
20302 | 03/18/2016 | 3 |
20302 | 03/25/2016 | 4 |
20302 | 02/18/2017 | 5 |
I would do it similar to what mkeintz suggested with a couple small but important enhancements. You need your data set to be sorted by ID and DATE, and also have by ID DATE; in the data step:
data HAVE;
input ID $1-5 @7 DATE mmddyy10.;
format DATE mmddyy10.;
lines;
16602 07/20/2015
16602 07/25/2015
16602 07/28/2015
20302 03/16/2016
20302 03/18/2016
20302 03/25/2016
20302 02/18/2015
;
proc sort data=HAVE;
by ID DATE;
run;
data WANT (drop=FIRSTDATE);
set HAVE;
by ID DATE;
retain FIRSTDATE;
if first.ID or DATE-FIRSTDATE>7 then
do;
FIRSTDATE = DATE;
CLUSTER+1;
end;
run;
Hope this helps.
This is a common request. You want to increment the cluster number whenever
To do this in a sas DATA step, you have to keep (i.e. "retain") the starting date of the current cluster, to be compared to the incoming date:
data want (drop=startdate);
set have;
by id;
retain startdate;
if first.id=1 or date-7 > startdate then do;
cluster+1;
startdate=date;
end;
run;
I would do it similar to what mkeintz suggested with a couple small but important enhancements. You need your data set to be sorted by ID and DATE, and also have by ID DATE; in the data step:
data HAVE;
input ID $1-5 @7 DATE mmddyy10.;
format DATE mmddyy10.;
lines;
16602 07/20/2015
16602 07/25/2015
16602 07/28/2015
20302 03/16/2016
20302 03/18/2016
20302 03/25/2016
20302 02/18/2015
;
proc sort data=HAVE;
by ID DATE;
run;
data WANT (drop=FIRSTDATE);
set HAVE;
by ID DATE;
retain FIRSTDATE;
if first.ID or DATE-FIRSTDATE>7 then
do;
FIRSTDATE = DATE;
CLUSTER+1;
end;
run;
Hope this helps.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.