BookmarkSubscribeRSS Feed
Sastech
Fluorite | Level 6

Hi All 

 

data have;
input id $ group (start_time end_time) (:datetime19.);
format start_time end_time e8601dt19.;
datalines;
1-10 1 27FEB22:12:32:00 27FEB22:13:04:00
1-9 1 27FEB22:12:32:00 27FEB22:13:04:00
2-15 1 27FEB22:12:33:00 27FEB22:13:12:00
1-4 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 6 27FEB22:11:34:00 27FEB22:12:39:00
1-9 6 27FEB22:11:34:00 27FEB22:12:39:00
103-21 7 05APR22:12:26:00 05APR22:13:02:00
103-22 7 05APR22:12:26:00 05APR22:13:02:00
102-58 8 05APR22:12:43:00 05APR22:13:06:00
103-22 8 05APR22:12:43:00 05APR22:13:06:00
103-17 9 05APR22:11:26:00 05APR22:12:34:00
103-19 9 05APR22:11:26:00 05APR22:12:34:00
103-19 10 05APR22:11:27:00 05APR22:12:52:00
103-21 10 05APR22:11:27:00 05APR22:12:52:00
;
run;

 


Expected Output :

 

Sastech_0-1650560080888.png

 

 

1. We are forming the variable new_group based on a time span covered by all members of a group.
(time span is defined as the earliest start and latest end of all the observations in a group)
2. all members of a group will be assigned to the same new_group.
3. The first group will be assigned newgroup=1.
4. If the earliest start_time or last end_time of a group falls in between already formed newgroup's earliest start_time and last end_time and also id's of the group also match newgroup then it should be included with the newgroup. Otherwise, it should create a new newsgroup.

 

Note:
1.Eventhough group 2 and 3 have same ids , they should not come under same new_group because both groups are having different time and they did not intersect/overlap.
2. Single id value should not come under multiple new_groups.

 

 

14 REPLIES 14
PaigeMiller
Diamond | Level 26

I'm not sure I grasp the method of creating this new group variable.

 

The first 3 rows have new_group=1, based on what logic? Please explain.

The next 4 rows have new_group=2, based on what logic? Please explain.

--
Paige Miller
Sastech
Fluorite | Level 6
The first 3 rows have new_group=1 because group 1 of ids does not match with the any other group of ids so its starts with new_group 1

The next 4 rows have new_group=2 beacuse from group 2 and 3 the id 1-5 is common in group 2 and 3 so its coming under the new_group 2.
From group 3 and 4 the id 1-6 is common in the both the group but time span is different so its not joining into same new_group 2 and its new_group 3
From group 4 and 5 the id 1-7 is common in the both the group but time span is different so its not joining into same new_group 3 and its new_group 4

PaigeMiller
Diamond | Level 26

So we only have to look at the next group, we would never have to look at the group that is five groups away?

--
Paige Miller
Sastech
Fluorite | Level 6
we have to look next common ids from one group to other group. if we see group 7 and the id 103 -21 is match with group 10 and coming under the same newgroup 5
Sastech
Fluorite | Level 6
Hi @PaigeMiller
except timespan everything is working as expected
how to alter the the code for across the days.

data out;
if 0 then set innerjoin_table_new1_t;
retain newgroup max_newgroup;
if _n_ = 1
then do;
declare hash new ();
new.definekey("gtmds_uid");
new.definedata("newgroup");
new.definedone();
max_newgroup = 0;
end;
ng = 0;
do until (last.group);
set innerjoin_table_new1_t;
by group;
if new.find() = 0 then ng = newgroup;
end;
if not ng
then do;
max_newgroup + 1;
ng = max_newgroup;
end;
newgroup = ng;
do until (last.group);
set innerjoin_table_new1_t;
by group;
if new.check() ne 0
then rc = new.add();
output;
end;
drop max_newgroup rc ng;
run;

Kurt_Bremser
Super User

Your code uses gtmds_uid as key of the hash, but this variable does not exist in the example dataset you posted.

Does id play any role in the building of new groups, or is it only the date that defines a group?

Sastech
Fluorite | Level 6
Hi @Kurt_Bremser
Updated the code shared below.
1.Eventhough group 2 and 3 have same ids , they should not come under same new_group because both groups are having different time and they did not intersect/overlap.
2. Single id value should not come under multiple new_groups.




data out;
if 0 then set have;
retain newgroup max_newgroup;
if _n_ = 1
then do;
declare hash new ();
new.definekey("id");
new.definedata("newgroup");
new.definedone();
max_newgroup = 0;
end;
ng = 0;
do until (last.group);
set innerjoin_table_new1_t;
by group;
if new.find() = 0 then ng = newgroup;
end;
if not ng
then do;
max_newgroup + 1;
ng = max_newgroup;
end;
newgroup = ng;
do until (last.group);
set innerjoin_table_new1_t;
by group;
if new.check() ne 0
then rc = new.add();
output;
end;
drop max_newgroup rc ng;
run;
Kurt_Bremser
Super User

You perfectly contradict yourself. In 1) you say that same id's should belong to different newgroups (under certain conditions), but in 2) you say that a single id should always be in the same newgroup.

Please make up your mind.

Sastech
Fluorite | Level 6

Hi @Kurt_Bremser

 

1-6 and 1- 7 (Group 4) need to be set to be with a separate newgroup as this window 25-02-2022 09:18:00 to 25-02-2022 09:33:00 happens at a separate time window which is not overlapping with any other newgroups time window.

 

 

Note: 1. If the time window of the group intersects/overlaps with newgroup and ids match, then the group will be added with the same newgroup. 2. If ids are not matching or time windows are not overlapping, then that group will be generated with newgroup number

 

Sastech_0-1650737806332.png

 

SASTECH1
Fluorite | Level 6

Hi All,

The data needs to be grouped based on conditions.

 

data have;
input id $ group (start_time end_time) (:datetime19.);
format start_time end_time e8601dt19.;
datalines;
1-10 1 27FEB22:12:32:00 27FEB22:13:04:00
1-9 1 27FEB22:12:32:00 27FEB22:13:04:00
2-15 1 27FEB22:12:33:00 27FEB22:13:12:00
1-4 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 6 27FEB22:11:34:00 27FEB22:12:39:00
1-9 6 27FEB22:11:34:00 27FEB22:12:39:00
103-21 7 05APR22:12:26:00 05APR22:13:02:00
103-22 7 05APR22:12:26:00 05APR22:13:02:00
102-58 8 05APR22:12:43:00 05APR22:13:06:00
103-22 8 05APR22:12:43:00 05APR22:13:06:00
103-17 9 05APR22:11:26:00 05APR22:12:34:00
103-19 9 05APR22:11:26:00 05APR22:12:34:00
103-19 10 05APR22:11:27:00 05APR22:12:52:00
103-21 10 05APR22:11:27:00 05APR22:12:52:00
;
run;

Expected Output:

SASTECH1_0-1650559537876.png

Grouping needs to be done based on below conditions.

1. We are forming the variable new_group based on a time span covered by all members of a group.
(time span is defined as the earliest start and latest end of all the observations in a group)
2. all members of a group will be assigned to the same new_group.
3. The first group will be assigned newgroup=1.
4. If the earliest start_time or last end_time of a group falls in between already formed newgroup's earliest start_time and last end_time and also id's of the group also match newgroup then it should be included with the newgroup. Otherwise, it should create a new newsgroup.

 

Note:
1.Eventhough group 2 and 3 have same ids , they should not come under same new_group because both groups are having different time and they did not intersect/overlap.
2. Single id value should not come under multiple new_groups.

 

Please help on this.

Thank you.

ChrisNZ
Tourmaline | Level 20

Why is GROUP 5 not in NEWGROUP 2 since it overlaps?

SASTECH1
Fluorite | Level 6

The ids in group 5 (1-7 and 1-8) are not in Newgroup -2. So only it is not in newgroup2 though time window overlap. 

New groups should be assigned based on both conditions checking ids present in it and also time overlap.

 

Thank you

 

@ChrisNZ 

andreas_lds
Jade | Level 19

Please don't double post questions.

average_joe
Obsidian | Level 7

I'm not sure about the interaction between IDs and formation of new groups because some of the info shared in this post seems inconsistent or contradictory. However, if the current group's range of time overlaps the new group's range of time, you can get the desired outcome.

 

* determine each group's min start and max end;
proc summary data=have nway;
class group;
output out=grp_range (drop=_:) min(start_time)=min_grp_start max(end_time)=max_grp_end;
run;

* determine new groups based only on overlaps of time;
data new_groups;
set grp_range;
retain new_group 0 min_new_grp_start max_new_grp_end;
format min_new_grp_start max_new_grp_end e8601dt19.;

* test if current group overlaps the current new_group;
* if not, then create a new group and reset the new group dates;
if not (min_grp_start <= max_new_grp_end and max_grp_end >= min_new_grp_start) then do;
   new_group + 1;
   min_new_grp_start = min_grp_start;
   max_new_grp_end = max_grp_end;
end;

* dynamically calc new group min and max;
min_new_grp_start = min(min_new_grp_start, min_grp_start);
max_new_grp_end = max(max_new_grp_end, max_grp_end);
run;

Of course, a much larger sample is needed to fully vet this logic.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1464 views
  • 2 likes
  • 7 in conversation