I have a large data set with this structure:
| ID | DATE |
| 16602 | 07/20/2015 |
| 16602 | 07/25/2015 |
| 16602 | 07/28/2015 |
| 20302 | 03/16/2016 |
| 20302 | 03/18/2016 |
| 20302 | 03/25/2016 |
| 20302 | 02/18/2015 |
| ID | DATE | CLUSTER |
| 16602 | 07/20/2015 | 1 |
| 16602 | 07/25/2015 | 1 |
| 16602 | 07/28/2015 | 2 |
| 20302 | 03/16/2016 | 3 |
| 20302 | 03/18/2016 | 3 |
| 20302 | 03/25/2016 | 4 |
| 20302 | 02/18/2017 | 5 |
I would do it similar to what mkeintz suggested with a couple small but important enhancements. You need your data set to be sorted by ID and DATE, and also have by ID DATE; in the data step:
data HAVE;
input ID $1-5 @7 DATE mmddyy10.;
format DATE mmddyy10.;
lines;
16602 07/20/2015
16602 07/25/2015
16602 07/28/2015
20302 03/16/2016
20302 03/18/2016
20302 03/25/2016
20302 02/18/2015
;
proc sort data=HAVE;
by ID DATE;
run;
data WANT (drop=FIRSTDATE);
set HAVE;
by ID DATE;
retain FIRSTDATE;
if first.ID or DATE-FIRSTDATE>7 then
do;
FIRSTDATE = DATE;
CLUSTER+1;
end;
run;
Hope this helps.
This is a common request. You want to increment the cluster number whenever
To do this in a sas DATA step, you have to keep (i.e. "retain") the starting date of the current cluster, to be compared to the incoming date:
data want (drop=startdate);
set have;
by id;
retain startdate;
if first.id=1 or date-7 > startdate then do;
cluster+1;
startdate=date;
end;
run;
I would do it similar to what mkeintz suggested with a couple small but important enhancements. You need your data set to be sorted by ID and DATE, and also have by ID DATE; in the data step:
data HAVE;
input ID $1-5 @7 DATE mmddyy10.;
format DATE mmddyy10.;
lines;
16602 07/20/2015
16602 07/25/2015
16602 07/28/2015
20302 03/16/2016
20302 03/18/2016
20302 03/25/2016
20302 02/18/2015
;
proc sort data=HAVE;
by ID DATE;
run;
data WANT (drop=FIRSTDATE);
set HAVE;
by ID DATE;
retain FIRSTDATE;
if first.ID or DATE-FIRSTDATE>7 then
do;
FIRSTDATE = DATE;
CLUSTER+1;
end;
run;
Hope this helps.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.