BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sastech
Fluorite | Level 6

If group 1 of gids matches with group 2 then-new group_id should be generated as 1 until the matches break from one group to another group and also it should check new comparing groups start time must fall on previous group start and end time.

Now, groups, 1 and 2 will be under the same new_group 1, and the minimum start time and maximum end time of both groups will be taken as start and end times for the next group check.

 

For Example, if group 10 will not match with the previous group’s ids and the time window also does not match with any other new_group window .so group 10 will be a separate group then-new group_id is 2.

 

We have provided the screenshot of the input and output for grouping.

 

Note: start_time and end_time are in character in the below query but it is in datetime20. Format

In our data.

 

data have;
input group 1-2 gids $4-8 start_time $10-25 end_time $27-42 ;
datalines;
1 85-51 16FEB22:00:03:00 16FEB22:00:55:00
1 85-52 16FEB22:00:03:00 16FEB22:00:55:00
2 85-52 16FEB22:00:07:00 16FEB22:01:38:00
2 85-53 16FEB22:00:07:00 16FEB22:01:38:00
3 85-53 16FEB22:00:16:00 16FEB22:01:40:00
3 85-54 16FEB22:00:16:00 16FEB22:01:40:00
4 85-54 16FEB22:01:18:00 16FEB22:01:58:00
4 85-55 16FEB22:01:18:00 16FEB22:01:58:00
5 85-55 16FEB22:01:20:00 16FEB22:02:51:00
5 85-56 16FEB22:01:20:00 16FEB22:02:51:00
6 85-56 16FEB22:01:21:00 16FEB22:03:50:00
6 85-57 16FEB22:01:21:00 16FEB22:03:50:00
7 85-57 16FEB22:02:35:00 16FEB22:04:17:00
7 85-58 16FEB22:02:35:00 16FEB22:04:17:00
8 85-58 16FEB22:02:47:00 16FEB22:05:39:00
8 85-60 16FEB22:02:47:00 16FEB22:05:39:00
9 85-60 16FEB22:03:59:00 16FEB22:05:39:00
9 89-51 16FEB22:03:59:00 16FEB22:05:39:00
10 85-66 16FEB22:10:11:00 16FEB22:10:52:00
10 85-67 16FEB22:10:11:00 16FEB22:10:52:00
11 89-51 16FEB22:05:01:00 16FEB22:06:10:00
11 89-52 16FEB22:05:01:00 16FEB22:06:10:00
12 89-52 16FEB22:05:02:00 16FEB22:06:10:00
12 89-53 16FEB22:05:02:00 16FEB22:06:10:00
13 89-72 16FEB22:13:57:00 16FEB22:15:19:00
13 89-73 16FEB22:13:57:00 16FEB22:15:19:00
13 89-75 16FEB22:13:57:00 16FEB22:15:19:00
14 89-73 16FEB22:14:00:00 16FEB22:15:19:00
14 89-75 16FEB22:14:00:00 16FEB22:15:19:00
14 89-76 16FEB22:14:00:00 16FEB22:15:19:00
15 89-80 16FEB22:21:11:00 16FEB22:23:05:00
15 89-81 16FEB22:21:11:00 16FEB22:23:05:00
16 89-81 16FEB22:21:13:00 17FEB22:00:31:00
16 89-82 16FEB22:21:13:00 17FEB22:00:31:00
;
run;

 

Sastech_0-1649155588598.png

screenshot of input and output 

@Kurt_Bremser 

 

Thank You in advance 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I made some changes to your data step, so we get real datetime values, but this is mostly irrelevant for the solution. Just so you can see how such values are read from datalines.

data have;
input group gids $ (start_time end_time) (:datetime19.);
format start_time end_time e8601dt19.;
datalines;
1 85-51 16FEB22:00:03:00 16FEB22:00:55:00
1 85-52 16FEB22:00:03:00 16FEB22:00:55:00
2 85-52 16FEB22:00:07:00 16FEB22:01:38:00
2 85-53 16FEB22:00:07:00 16FEB22:01:38:00
3 85-53 16FEB22:00:16:00 16FEB22:01:40:00
3 85-54 16FEB22:00:16:00 16FEB22:01:40:00
4 85-54 16FEB22:01:18:00 16FEB22:01:58:00
4 85-55 16FEB22:01:18:00 16FEB22:01:58:00
5 85-55 16FEB22:01:20:00 16FEB22:02:51:00
5 85-56 16FEB22:01:20:00 16FEB22:02:51:00
6 85-56 16FEB22:01:21:00 16FEB22:03:50:00
6 85-57 16FEB22:01:21:00 16FEB22:03:50:00
7 85-57 16FEB22:02:35:00 16FEB22:04:17:00
7 85-58 16FEB22:02:35:00 16FEB22:04:17:00
8 85-58 16FEB22:02:47:00 16FEB22:05:39:00
8 85-60 16FEB22:02:47:00 16FEB22:05:39:00
9 85-60 16FEB22:03:59:00 16FEB22:05:39:00
9 89-51 16FEB22:03:59:00 16FEB22:05:39:00
10 85-66 16FEB22:10:11:00 16FEB22:10:52:00
10 85-67 16FEB22:10:11:00 16FEB22:10:52:00
11 89-51 16FEB22:05:01:00 16FEB22:06:10:00
11 89-52 16FEB22:05:01:00 16FEB22:06:10:00
12 89-52 16FEB22:05:02:00 16FEB22:06:10:00
12 89-53 16FEB22:05:02:00 16FEB22:06:10:00
13 89-72 16FEB22:13:57:00 16FEB22:15:19:00
13 89-73 16FEB22:13:57:00 16FEB22:15:19:00
13 89-75 16FEB22:13:57:00 16FEB22:15:19:00
14 89-73 16FEB22:14:00:00 16FEB22:15:19:00
14 89-75 16FEB22:14:00:00 16FEB22:15:19:00
14 89-76 16FEB22:14:00:00 16FEB22:15:19:00
15 89-80 16FEB22:21:11:00 16FEB22:23:05:00
15 89-81 16FEB22:21:11:00 16FEB22:23:05:00
16 89-81 16FEB22:21:13:00 17FEB22:00:31:00
16 89-82 16FEB22:21:13:00 17FEB22:00:31:00
;

From that, use a hash object to keep track of new groups:

data want;
set have;
by group;
retain newgroup max_newgroup;
if _n_ = 1
then do; /* build a dynamic object which keeps track of gids to newgroup relationship */
  declare hash new ();
  new.definekey("gids");
  new.definedata("newgroup");
  new.definedone();
  max_newgroup = 0;
end;
if first.group and new.find() ne 0
/* at the start of a group, we can't find a match, so we create a new group */
then do;
  max_newgroup + 1;
  newgroup = max_newgroup;
  rc = new.add();
end;
if new.find() ne 0 /* current gids is not found in the hash */
/* since newgroup is either taken from the hash or newly set
  at the start of a group, and retained, we can now add it */
then rc = new.add();
drop max_newgroup rc;
run;

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

I made some changes to your data step, so we get real datetime values, but this is mostly irrelevant for the solution. Just so you can see how such values are read from datalines.

data have;
input group gids $ (start_time end_time) (:datetime19.);
format start_time end_time e8601dt19.;
datalines;
1 85-51 16FEB22:00:03:00 16FEB22:00:55:00
1 85-52 16FEB22:00:03:00 16FEB22:00:55:00
2 85-52 16FEB22:00:07:00 16FEB22:01:38:00
2 85-53 16FEB22:00:07:00 16FEB22:01:38:00
3 85-53 16FEB22:00:16:00 16FEB22:01:40:00
3 85-54 16FEB22:00:16:00 16FEB22:01:40:00
4 85-54 16FEB22:01:18:00 16FEB22:01:58:00
4 85-55 16FEB22:01:18:00 16FEB22:01:58:00
5 85-55 16FEB22:01:20:00 16FEB22:02:51:00
5 85-56 16FEB22:01:20:00 16FEB22:02:51:00
6 85-56 16FEB22:01:21:00 16FEB22:03:50:00
6 85-57 16FEB22:01:21:00 16FEB22:03:50:00
7 85-57 16FEB22:02:35:00 16FEB22:04:17:00
7 85-58 16FEB22:02:35:00 16FEB22:04:17:00
8 85-58 16FEB22:02:47:00 16FEB22:05:39:00
8 85-60 16FEB22:02:47:00 16FEB22:05:39:00
9 85-60 16FEB22:03:59:00 16FEB22:05:39:00
9 89-51 16FEB22:03:59:00 16FEB22:05:39:00
10 85-66 16FEB22:10:11:00 16FEB22:10:52:00
10 85-67 16FEB22:10:11:00 16FEB22:10:52:00
11 89-51 16FEB22:05:01:00 16FEB22:06:10:00
11 89-52 16FEB22:05:01:00 16FEB22:06:10:00
12 89-52 16FEB22:05:02:00 16FEB22:06:10:00
12 89-53 16FEB22:05:02:00 16FEB22:06:10:00
13 89-72 16FEB22:13:57:00 16FEB22:15:19:00
13 89-73 16FEB22:13:57:00 16FEB22:15:19:00
13 89-75 16FEB22:13:57:00 16FEB22:15:19:00
14 89-73 16FEB22:14:00:00 16FEB22:15:19:00
14 89-75 16FEB22:14:00:00 16FEB22:15:19:00
14 89-76 16FEB22:14:00:00 16FEB22:15:19:00
15 89-80 16FEB22:21:11:00 16FEB22:23:05:00
15 89-81 16FEB22:21:11:00 16FEB22:23:05:00
16 89-81 16FEB22:21:13:00 17FEB22:00:31:00
16 89-82 16FEB22:21:13:00 17FEB22:00:31:00
;

From that, use a hash object to keep track of new groups:

data want;
set have;
by group;
retain newgroup max_newgroup;
if _n_ = 1
then do; /* build a dynamic object which keeps track of gids to newgroup relationship */
  declare hash new ();
  new.definekey("gids");
  new.definedata("newgroup");
  new.definedone();
  max_newgroup = 0;
end;
if first.group and new.find() ne 0
/* at the start of a group, we can't find a match, so we create a new group */
then do;
  max_newgroup + 1;
  newgroup = max_newgroup;
  rc = new.add();
end;
if new.find() ne 0 /* current gids is not found in the hash */
/* since newgroup is either taken from the hash or newly set
  at the start of a group, and retained, we can now add it */
then rc = new.add();
drop max_newgroup rc;
run;
Sastech
Fluorite | Level 6

Hi Kurt,

 

Thank you for the quick response, the code which is sent is working fine with the testing data which we have

and we will check with the actual data which contains 4M rows for different days. 

Sastech
Fluorite | Level 6

Problem

 

data have;
input group gids $ (start_time end_time) (:datetime19.);
format start_time end_time e8601dt19.;
datalines;
1 1-10 16FEB22:08:32:00 16FEB22:08:58:00
1 1-11 16FEB22:08:32:00 16FEB22:08:58:00
1 1-8 16FEB22:07:48:00 16FEB22:08:36:00
2 1-11 16FEB22:08:33:00 16FEB22:09:16:00
2 1-12 16FEB22:08:33:00 16FEB22:09:16:00
6 1-8 16FEB22:07:28:00 16FEB22:08:35:00
6 2-57 16FEB22:07:28:00 16FEB22:08:35:00
146 2-56 16FEB22:07:05:00 16FEB22:07:54:00
146 2-57 16FEB22:07:05:00 16FEB22:07:54:00
;
run;
As per the below code , it searches only first gid of group only but it has to check all gids of a group for matching. If any of the gids in the group matches same newgroup should be populated.

 

if first.group and new.find() ne 0
/* at the start of a group, we can't find a match, so we create a new group */
then do;
max_newgroup + 1;
newgroup = max_newgroup;
rc = new.add();

 

As result of this, 2-56 gid alone goes to new group when we analyzed the code with another set of data. But all belongs to same group. Could you please let us know how to resolve it.

Sastech_0-1649270160684.png

 

@Kurt_Bremser 

Kurt_Bremser
Super User

So we need to first check each group if we can find any assigned newgroup, and then process the group again to write newgroup:

data want;
if 0 then set have; /* defines variables into the PDV, does nothing else */
retain newgroup max_newgroup;
if _n_ = 1
then do; /* build a dynamic object which keeps track of gids to newgroup relationship */
  declare hash new ();
  new.definekey("gids");
  new.definedata("newgroup");
  new.definedone();
  max_newgroup = 0;
end;
ng = 0;
do until (last.group);
  set have;
  by group;
  if new.find() = 0 then ng = newgroup;
end;
if not ng
then do;
  max_newgroup + 1;
  ng = max_newgroup;
end;
newgroup = ng;
do until (last.group);
  set have;
  by group;
  if new.check() ne 0 /* current gids is not found in the hash */
  then rc = new.add();
  output;
end;
drop max_newgroup rc ng;
run;
Sastech
Fluorite | Level 6

Hi Kurt

 

 

data have;
input gids $ group (start_time end_time) (:datetime19.);
format start_time end_time e8601dt19.;
datalines;
1-10 1 27FEB22:12:32:00 27FEB22:13:04:00
1-9 1 27FEB22:12:32:00 27FEB22:13:04:00
2-15 1 27FEB22:12:33:00 27FEB22:13:12:00
1-4 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 2 27FEB22:10:00:00 27FEB22:10:51:00
1-5 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 3 27FEB22:10:27:00 27FEB22:11:39:00
1-6 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 4 25FEB22:09:18:00 25FEB22:09:33:00
1-7 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 5 27FEB22:11:33:00 27FEB22:12:39:00
1-8 6 27FEB22:11:34:00 27FEB22:12:39:00
1-9 6 27FEB22:11:34:00 27FEB22:12:39:00
;
run;

 

We have the data for multiple days previously we tried to within a single day of data.

While grouping for multiple days if there is any group that is falling on multiple days like

 

As per the Screenshot below

Sastech_0-1650359472017.png

 

 

 

Group 3 and group 4 should not match even though the gids match but they fall on different dates and there is no continuity in the time frame.

 

Note: when we working with multiple dates we are facing this issue. If a group starts from 25th Feb 11:55 PM to 26th Feb 00:35 AM and another group starts at 00:10 AM on 26th Feb and their gids match then it can be a single group.

 

@Kurt_Bremser 

Ksharp
Super User
data have;
input group 1-2 gids $4-8 start_time : datetime20. end_time : datetime20.  ;
format start_time  end_time : datetime20. ;
datalines;
1 85-51 16FEB22:00:03:00 16FEB22:00:55:00
1 85-52 16FEB22:00:03:00 16FEB22:00:55:00
2 85-52 16FEB22:00:07:00 16FEB22:01:38:00
2 85-53 16FEB22:00:07:00 16FEB22:01:38:00
3 85-53 16FEB22:00:16:00 16FEB22:01:40:00
3 85-54 16FEB22:00:16:00 16FEB22:01:40:00
4 85-54 16FEB22:01:18:00 16FEB22:01:58:00
4 85-55 16FEB22:01:18:00 16FEB22:01:58:00
5 85-55 16FEB22:01:20:00 16FEB22:02:51:00
5 85-56 16FEB22:01:20:00 16FEB22:02:51:00
6 85-56 16FEB22:01:21:00 16FEB22:03:50:00
6 85-57 16FEB22:01:21:00 16FEB22:03:50:00
7 85-57 16FEB22:02:35:00 16FEB22:04:17:00
7 85-58 16FEB22:02:35:00 16FEB22:04:17:00
8 85-58 16FEB22:02:47:00 16FEB22:05:39:00
8 85-60 16FEB22:02:47:00 16FEB22:05:39:00
9 85-60 16FEB22:03:59:00 16FEB22:05:39:00
9 89-51 16FEB22:03:59:00 16FEB22:05:39:00
10 85-66 16FEB22:10:11:00 16FEB22:10:52:00
10 85-67 16FEB22:10:11:00 16FEB22:10:52:00
11 89-51 16FEB22:05:01:00 16FEB22:06:10:00
11 89-52 16FEB22:05:01:00 16FEB22:06:10:00
12 89-52 16FEB22:05:02:00 16FEB22:06:10:00
12 89-53 16FEB22:05:02:00 16FEB22:06:10:00
13 89-72 16FEB22:13:57:00 16FEB22:15:19:00
13 89-73 16FEB22:13:57:00 16FEB22:15:19:00
13 89-75 16FEB22:13:57:00 16FEB22:15:19:00
14 89-73 16FEB22:14:00:00 16FEB22:15:19:00
14 89-75 16FEB22:14:00:00 16FEB22:15:19:00
14 89-76 16FEB22:14:00:00 16FEB22:15:19:00
15 89-80 16FEB22:21:11:00 16FEB22:23:05:00
15 89-81 16FEB22:21:11:00 16FEB22:23:05:00
16 89-81 16FEB22:21:13:00 17FEB22:00:31:00
16 89-82 16FEB22:21:13:00 17FEB22:00:31:00
;
run;

data step1;
 set have(keep= start_time  end_time );
 do dt= start_time to end_time ;
  output; 
 end;
 keep dt;
run;
proc sort data=step1 out=step2 nodupkey;by dt;run;
data step3;
 set step2;
 if dif(dt) ne 1 then group+1;
 format dt datetime.;
run;
data want;
 if _n_=1 then do;
  declare hash h(dataset:'step3');
  h.definekey('dt');
  h.definedata('group');
  h.definedone();
 end;
set have;
 do dt= start_time to end_time ;
  call missing(group);
  if h.find()=0 then leave; 
 end;
drop dt;
run;
proc sort data=want;by group;run;
Sastech
Fluorite | Level 6

Hi Ksharp,

 

Thank you for the quick response, the code which is sent is working fine with the testing data which we have

and we will check with the actual data which contains 4M rows for different days. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1268 views
  • 2 likes
  • 3 in conversation