I have a large data set with this structure:
| ID | DATE |
| 16602 | 07/20/2015 |
| 16602 | 07/25/2015 |
| 16602 | 07/28/2015 |
| 20302 | 03/16/2016 |
| 20302 | 03/18/2016 |
| 20302 | 03/25/2016 |
| 20302 | 02/18/2015 |
| ID | DATE | CLUSTER |
| 16602 | 07/20/2015 | 1 |
| 16602 | 07/25/2015 | 1 |
| 16602 | 07/28/2015 | 2 |
| 20302 | 03/16/2016 | 3 |
| 20302 | 03/18/2016 | 3 |
| 20302 | 03/25/2016 | 4 |
| 20302 | 02/18/2017 | 5 |
I would do it similar to what mkeintz suggested with a couple small but important enhancements. You need your data set to be sorted by ID and DATE, and also have by ID DATE; in the data step:
data HAVE;
input ID $1-5 @7 DATE mmddyy10.;
format DATE mmddyy10.;
lines;
16602 07/20/2015
16602 07/25/2015
16602 07/28/2015
20302 03/16/2016
20302 03/18/2016
20302 03/25/2016
20302 02/18/2015
;
proc sort data=HAVE;
by ID DATE;
run;
data WANT (drop=FIRSTDATE);
set HAVE;
by ID DATE;
retain FIRSTDATE;
if first.ID or DATE-FIRSTDATE>7 then
do;
FIRSTDATE = DATE;
CLUSTER+1;
end;
run;
Hope this helps.
This is a common request. You want to increment the cluster number whenever
To do this in a sas DATA step, you have to keep (i.e. "retain") the starting date of the current cluster, to be compared to the incoming date:
data want (drop=startdate);
set have;
by id;
retain startdate;
if first.id=1 or date-7 > startdate then do;
cluster+1;
startdate=date;
end;
run;
I would do it similar to what mkeintz suggested with a couple small but important enhancements. You need your data set to be sorted by ID and DATE, and also have by ID DATE; in the data step:
data HAVE;
input ID $1-5 @7 DATE mmddyy10.;
format DATE mmddyy10.;
lines;
16602 07/20/2015
16602 07/25/2015
16602 07/28/2015
20302 03/16/2016
20302 03/18/2016
20302 03/25/2016
20302 02/18/2015
;
proc sort data=HAVE;
by ID DATE;
run;
data WANT (drop=FIRSTDATE);
set HAVE;
by ID DATE;
retain FIRSTDATE;
if first.ID or DATE-FIRSTDATE>7 then
do;
FIRSTDATE = DATE;
CLUSTER+1;
end;
run;
Hope this helps.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.