Hi All
data have;
input id $ group (start_time end_time) (:datetime19.);
format start_time end_time e8601dt19.;
datalines;
1-10 1 27FEB22:12:32:00 27FEB22:13:04:00
1-9 1 27FEB22:12:32:00 27FEB22:13:04:00
2-15 1 27FEB22:12:33:00 27FEB22:13:12:00
1-4 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 6 27FEB22:11:34:00 27FEB22:12:39:00
1-9 6 27FEB22:11:34:00 27FEB22:12:39:00
103-21 7 05APR22:12:26:00 05APR22:13:02:00
103-22 7 05APR22:12:26:00 05APR22:13:02:00
102-58 8 05APR22:12:43:00 05APR22:13:06:00
103-22 8 05APR22:12:43:00 05APR22:13:06:00
103-17 9 05APR22:11:26:00 05APR22:12:34:00
103-19 9 05APR22:11:26:00 05APR22:12:34:00
103-19 10 05APR22:11:27:00 05APR22:12:52:00
103-21 10 05APR22:11:27:00 05APR22:12:52:00
;
run;
Expected Output :
1. We are forming the variable new_group based on a time span covered by all members of a group.
(time span is defined as the earliest start and latest end of all the observations in a group)
2. all members of a group will be assigned to the same new_group.
3. The first group will be assigned newgroup=1.
4. If the earliest start_time or last end_time of a group falls in between already formed newgroup's earliest start_time and last end_time and also id's of the group also match newgroup then it should be included with the newgroup. Otherwise, it should create a new newsgroup.
Note:
1.Eventhough group 2 and 3 have same ids , they should not come under same new_group because both groups are having different time and they did not intersect/overlap.
2. Single id value should not come under multiple new_groups.
I'm not sure I grasp the method of creating this new group variable.
The first 3 rows have new_group=1, based on what logic? Please explain.
The next 4 rows have new_group=2, based on what logic? Please explain.
So we only have to look at the next group, we would never have to look at the group that is five groups away?
Your code uses gtmds_uid as key of the hash, but this variable does not exist in the example dataset you posted.
Does id play any role in the building of new groups, or is it only the date that defines a group?
You perfectly contradict yourself. In 1) you say that same id's should belong to different newgroups (under certain conditions), but in 2) you say that a single id should always be in the same newgroup.
Please make up your mind.
1-6 and 1- 7 (Group 4) need to be set to be with a separate newgroup as this window 25-02-2022 09:18:00 to 25-02-2022 09:33:00 happens at a separate time window which is not overlapping with any other newgroups time window.
Note: 1. If the time window of the group intersects/overlaps with newgroup and ids match, then the group will be added with the same newgroup. 2. If ids are not matching or time windows are not overlapping, then that group will be generated with newgroup number
Hi All,
The data needs to be grouped based on conditions.
data have;
input id $ group (start_time end_time) (:datetime19.);
format start_time end_time e8601dt19.;
datalines;
1-10 1 27FEB22:12:32:00 27FEB22:13:04:00
1-9 1 27FEB22:12:32:00 27FEB22:13:04:00
2-15 1 27FEB22:12:33:00 27FEB22:13:12:00
1-4 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 6 27FEB22:11:34:00 27FEB22:12:39:00
1-9 6 27FEB22:11:34:00 27FEB22:12:39:00
103-21 7 05APR22:12:26:00 05APR22:13:02:00
103-22 7 05APR22:12:26:00 05APR22:13:02:00
102-58 8 05APR22:12:43:00 05APR22:13:06:00
103-22 8 05APR22:12:43:00 05APR22:13:06:00
103-17 9 05APR22:11:26:00 05APR22:12:34:00
103-19 9 05APR22:11:26:00 05APR22:12:34:00
103-19 10 05APR22:11:27:00 05APR22:12:52:00
103-21 10 05APR22:11:27:00 05APR22:12:52:00
;
run;
Expected Output:
Grouping needs to be done based on below conditions.
1. We are forming the variable new_group based on a time span covered by all members of a group.
(time span is defined as the earliest start and latest end of all the observations in a group)
2. all members of a group will be assigned to the same new_group.
3. The first group will be assigned newgroup=1.
4. If the earliest start_time or last end_time of a group falls in between already formed newgroup's earliest start_time and last end_time and also id's of the group also match newgroup then it should be included with the newgroup. Otherwise, it should create a new newsgroup.
Note:
1.Eventhough group 2 and 3 have same ids , they should not come under same new_group because both groups are having different time and they did not intersect/overlap.
2. Single id value should not come under multiple new_groups.
Please help on this.
Thank you.
Why is GROUP 5 not in NEWGROUP 2 since it overlaps?
The ids in group 5 (1-7 and 1-8) are not in Newgroup -2. So only it is not in newgroup2 though time window overlap.
New groups should be assigned based on both conditions checking ids present in it and also time overlap.
Thank you
Please don't double post questions.
I'm not sure about the interaction between IDs and formation of new groups because some of the info shared in this post seems inconsistent or contradictory. However, if the current group's range of time overlaps the new group's range of time, you can get the desired outcome.
* determine each group's min start and max end;
proc summary data=have nway;
class group;
output out=grp_range (drop=_:) min(start_time)=min_grp_start max(end_time)=max_grp_end;
run;
* determine new groups based only on overlaps of time;
data new_groups;
set grp_range;
retain new_group 0 min_new_grp_start max_new_grp_end;
format min_new_grp_start max_new_grp_end e8601dt19.;
* test if current group overlaps the current new_group;
* if not, then create a new group and reset the new group dates;
if not (min_grp_start <= max_new_grp_end and max_grp_end >= min_new_grp_start) then do;
new_group + 1;
min_new_grp_start = min_grp_start;
max_new_grp_end = max_grp_end;
end;
* dynamically calc new group min and max;
min_new_grp_start = min(min_new_grp_start, min_grp_start);
max_new_grp_end = max(max_new_grp_end, max_grp_end);
run;
Of course, a much larger sample is needed to fully vet this logic.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.