Obsidian | Level 7

## searching for the next not repetitive date

Hi guys!

Help me please to solve the problem. I try to find score. ID=A gives 1 point, ID=B minus 1 point, but it should be in chronological order, like in attached file. How is it possible to solve it? The result for code 111 should be 1 point.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: searching for the next not repetitive date

OK. You want this one ?

``````data have;
data have;
input id \$ key \$ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;run;
data a b;
set have;
if id='A' then output a;
else output b;
run;
data want;
if _n_=1 then do;
if 0 then set b(rename=(date=n_dtstamp));
declare hash h(dataset:'b(rename=(date=n_dtstamp))',multidata:'y');
h.definekey('key');
h.definedata('n_dtstamp');
h.definedone();
end;
set a;
max=999999999;
rc=h.find();
do while(rc=0);
if date<n_dtstamp and (n_dtstamp-date)<max then do; want=n_dtstamp;max=(n_dtstamp-date);end;
rc=h.find_next();
end;

if want then do;
rc=h.find();
do while(rc=0);
if want=n_dtstamp then h.removedup();
rc=h.find_next();
end;
end;
drop n_dtstamp rc max;
format want datetime.;
run;
``````

14 REPLIES 14
PROC Star

## Re: searching for the next not repetitive date

Please post the data you have in usable form (a datastep using datalines, nothing else) and show the expected result.

Obsidian | Level 7

## Re: searching for the next not repetitive date

Data:

data want;1
input id evnt_d;2
datalines; 3
A    05Mar2020 13:02:00
B    06Mar2020 13:45:11
B    08Mar2020 13:15:42
B    14Mar2020 12:36:22
A    16Mar2020 14:26:17
A    19Mar2020 14:01:21
A    20Mar2020 18:01:21
B    23Mar2020 8:56:01
B    27Mar2020 8:56:40
A    28Mar2020 10:42:16
B    29Mar2020 10:42:16

; 3

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

Super User

## Re: searching for the next not repetitive date

Please fix your data step, so that it runs without ERROR messages and creates the dataset you have.

``` 74         data want;
75         input id evnt_d;
76         datalines;

NOTE: Invalid data for id in line 77 1-1.
NOTE: Invalid data for evnt_d in line 77 6-14.
REGEL:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
77         A    05Mar2020 13:02:00
id=. evnt_d=. _ERROR_=1 _N_=1
NOTE: Invalid data for id in line 78 1-1.
NOTE: Invalid data for evnt_d in line 78 6-14.
78         B    06Mar2020 13:45:11
id=. evnt_d=. _ERROR_=1 _N_=2
NOTE: Invalid data for id in line 79 1-1.
NOTE: Invalid data for evnt_d in line 79 6-14.
79         B    08Mar2020 13:15:42
id=. evnt_d=. _ERROR_=1 _N_=3
NOTE: Invalid data for id in line 80 1-1.
NOTE: Invalid data for evnt_d in line 80 6-14.
80         B    14Mar2020 12:36:22
id=. evnt_d=. _ERROR_=1 _N_=4
NOTE: Invalid data for id in line 81 1-1.
NOTE: Invalid data for evnt_d in line 81 6-14.
81         A    16Mar2020 14:26:17
id=. evnt_d=. _ERROR_=1 _N_=5
NOTE: Invalid data for id in line 82 1-1.
NOTE: Invalid data for evnt_d in line 82 6-14.
82         A    19Mar2020 14:01:21
id=. evnt_d=. _ERROR_=1 _N_=6
NOTE: Invalid data for id in line 83 1-1.
NOTE: Invalid data for evnt_d in line 83 6-14.
83         A    20Mar2020 18:01:21
id=. evnt_d=. _ERROR_=1 _N_=7
NOTE: Invalid data for id in line 84 1-1.
NOTE: Invalid data for evnt_d in line 84 6-14.
84         B    23Mar2020 8:56:01
id=. evnt_d=. _ERROR_=1 _N_=8
NOTE: Invalid data for id in line 85 1-1.
NOTE: Invalid data for evnt_d in line 85 6-14.
85         B    27Mar2020 8:56:40
id=. evnt_d=. _ERROR_=1 _N_=9
NOTE: Invalid data for id in line 86 1-1.
NOTE: Invalid data for evnt_d in line 86 6-14.
86         A    28Mar2020 10:42:16
id=. evnt_d=. _ERROR_=1 _N_=10
NOTE: Invalid data for id in line 87 1-1.
NOTE: Invalid data for evnt_d in line 87 6-14.
87         B    29Mar2020 10:42:16
id=. evnt_d=. _ERROR_=1 _N_=11
```
Obsidian | Level 7

## Re: searching for the next not repetitive date

data want;
input id\$ evnt_d\$21.;
cards;
A 05Mar2020 13:02:00
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
A 16Mar2020 14:26:17
A 19Mar2020 14:01:21
A 20Mar2020 18:01:21
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
A 28Mar2020 10:42:16
B 29Mar2020 10:42:16
;
run;

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

PROC Star

## Re: searching for the next not repetitive date

@J_J_J wrote:

data want;
input id\$ evnt_d\$21.;
cards;
A 05Mar2020 13:02:00
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
A 16Mar2020 14:26:17
A 19Mar2020 14:01:21
A 20Mar2020 18:01:21
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
A 28Mar2020 10:42:16
B 29Mar2020 10:42:16
;
run;

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

Sorry, but i can't see how the logic you have described in the starting message could create the dataset you want.

Obsidian | Level 7

## Re: searching for the next not repetitive date

I'll try to explain other way. How to select next latest than group_id date from table B for each group_id from table A, provided that this date wasn't used for the previous group? Group_id and evnt_d  are in chronological order.

data A;
input group_id\$21.;
cards;
05Mar2020 13:02:00
16Mar2020 14:26:17
19Mar2020 14:01:21
20Mar2020 18:01:21
28Mar2020 10:42:16
;
run;

data B;
input evnt_d\$21.;
cards;
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
B 29Mar2020 10:42:16
;
run;

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

## Re: searching for the next not repetitive date

For each A record, you want the closest following B record that has not already been paired with an A record:

You can do this be using two SET statements:

1. A SET just for A's
2. A SET just for B's that read records until the B datetime stamp is greater than the A datetime stamp.  Since the SET statement will read no B record more than once, you won't get the same B assigned to multiple A's:
``````data have;
input id \$ dtstamp datetime18.;
format dtstamp datetime18. ;
cards;
A 05Mar2020:13:02:00
B 06Mar2020:13:45:11
B 08Mar2020:13:15:42
B 14Mar2020:12:36:22
A 16Mar2020:14:26:17
A 19Mar2020:14:01:21
A 20Mar2020:18:01:21
B 23Mar2020:8:56:01
B 27Mar2020:8:56:40
A 28Mar2020:10:42:16
B 29Mar2020:10:42:16
run;

data want (drop=_:);
set have (where=(id='A') rename=(dtstamp=dtstampa));
do until (dtstampb>dtstampa or end_of_b);
if end_of_b=0 then set have (where=(_id='B') rename=(id=_id dtstamp=dtstampb)) end=end_of_b;
else dtstampb=.;
end;
run;``````

Note the SET statement for _ID='B' uses an "if end_of_b=0" test to make sure that the data step does not prematurely stop because all the B records have been exhausted.

As to the "rename" parameters.

1. You have to give distinct names to the datetime stamps for A records and B records, since they must co-exist in the same output record
2. You have to rename ID to _ID for reading B records.  Otherwise the output records will have the right datetime stamps but will all have ID=B (not A).

Edited note:  This program assumes the data are sorted by DTSTAMP.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

## Re: searching for the next not repetitive date

mkeintz, thank you very much!

Super User

## Re: searching for the next not repetitive date

I like this question.

``````data have;
input id \$ dtstamp datetime18.;
format dtstamp datetime18. ;
cards;
A 05Mar2020:13:02:00
B 06Mar2020:13:45:11
B 08Mar2020:13:15:42
B 14Mar2020:12:36:22
A 16Mar2020:14:26:17
A 19Mar2020:14:01:21
A 20Mar2020:18:01:21
B 23Mar2020:8:56:01
B 27Mar2020:8:56:40
A 28Mar2020:10:42:16
B 29Mar2020:10:42:16
;
run;
data a b;
set have;
if id='A' then output a;
else output b;
run;
data want;
if _n_=1 then do;
if 0 then set b(rename=(dtstamp=n_dtstamp));
declare hash h(dataset:'b(rename=(dtstamp=n_dtstamp))',ordered:'a');
declare hiter hi('h');
h.definekey('n_dtstamp');
h.definedata('n_dtstamp');
h.definedone();
end;
set a;
rc=hi.first();
do while(rc=0);
if dtstamp<n_dtstamp then do;want=n_dtstamp; leave;end;
rc=hi.next();
end;
if want then do;
rc=hi.next();
h.remove(key:want);
end;
drop n_dtstamp rc ;
format want datetime.;
run;
``````
Obsidian | Level 7

## Re: searching for the next not repetitive date

Could you help me to do it with a key please?

data have;
input id \$ key \$ date datetime18.;
format date datetime18. ;
cards;

ID key date
A 4129 21Jan2021 15:31:53
A 5951 21Jan2021 15:33:13
A 1093 27Jan2021 13:04:08
A 5794 04Feb2021 16:25:07
A 6095 04Feb2021 16:26:41
B 5794 12Feb2021 14:55:57
B 6095 15Feb2021 8:19:39
A 1031 15Feb2021 14:08:23
A 2166 26Feb2021 11:44:59
;
run;

Expected result:
key    datea                           dateb
4129 21Jan2021 15:31:53
5951 21Jan2021 15:33:13

1093  27Jan2021 13:04:08
5794  04Feb2021 16:25:07 12Feb2021 14:55:57
6095  04Feb2021 16:26:41 15Feb2021 8:19:39
1031  15Feb2021 14:08:23
2166  26Feb2021 11:44:59

Super User

## Re: searching for the next not repetitive date

So you just simply merge it by KEY ?

``````

data have;
input id \$ key \$ date datetime18.;
format date datetime18. ;
cards;
A 4129 21Jan2021 15:31:53
A 5951 21Jan2021 15:33:13
A 1093 27Jan2021 13:04:08
A 5794 04Feb2021 16:25:07
A 6095 04Feb2021 16:26:41
B 5794 12Feb2021 14:55:57
B 6095 15Feb2021 8:19:39
A 1031 15Feb2021 14:08:23
A 2166 26Feb2021 11:44:59
;
run;

data a b;
set have;
if id='A' then output a;
else output b;
run;
data want;
if _n_=1 then do;
if 0 then set b(rename=(date=n_dtstamp));
declare hash h(dataset:'b(rename=(date=n_dtstamp))');
h.definekey('key');
h.definedata('n_dtstamp');
h.definedone();
end;
set a;
call missing(n_dtstamp);
rc=h.find();
drop rc;
run;``````
Obsidian | Level 7

## Re: searching for the next not repetitive date

Ksharp, something goes wrong with this code.

data have;
input id \$ key \$ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;
run;

data a b;
set have;
if id='A' then output a;
else output b;
run;
data want;
if _n_=1 then do;
if 0 then set b(rename=(date=n_dtstamp));
declare hash h(dataset:'b(rename=(date=n_dtstamp))');
h.definekey('key');
h.definedata('n_dtstamp');
h.definedone();
end;
set a;
call missing(n_dtstamp);
rc=h.find();
drop rc;
run;

Expected result is:

id    key              date                                n_dtstamp
A    11111111      31Oct2019 10:50:42
A    22222222    31Oct2019 10:51:17      30Mar2020 10:49:35
A    22222222    02Mar2020 15:12:33
B    33333333   16Mar2020 12:23:46
A    44444444    25Mar2020 13:02:00     01Oct2020 8:56:40
A    55555555    27Mar2020 13:20:07
A    66666666    27Mar2020 13:35:35     20Nov2020 14:26:29
A    33333333    30Apr2020 16:09:14     17Nov2020 9:31:58
A    22222222    12Jun2020 15:32:04
A    66666666    12Jun2020 16:42:03
A    33333333    19Aug2020 9:16:55
A    33333333    18Sep2020 9:40:37
A    11111111     12Oct2020 12:32:40
A    11111111     09Nov2020 14:19:16
A    66666666    26Feb2021 11:44:59

But I get:

id    key           n_dtstamp                    date
A    11111111  31Oct2019 10:50:42
A    22222222 30Mar2020 10:49:35    31Oct2019 10:51:17
A    22222222 30Mar2020 10:49:35    02Mar2020 15:12:33
A    44444444 01Oct2020 8:56:40      25Mar2020 13:02:00
A    55555555 27Mar2020 13:20:07
A    66666666 20Nov2020 14:26:29    27Mar2020 13:35:35
A    33333333 16Mar2020 12:23:46    30Apr2020 16:09:14
A    22222222 30Mar2020 10:49:35    12Jun2020 15:32:04
A    66666666 20Nov2020 14:26:29    12Jun2020 16:42:03
A    33333333 16Mar2020 12:23:46    19Aug2020 9:16:55
A    33333333 16Mar2020 12:23:46    18Sep2020 9:40:37
A    11111111 12Oct2020 12:32:40
A    11111111 09Nov2020 14:19:16
A    66666666 20Nov2020 14:26:29    26Feb2021 11:44:59

Super User

## Re: searching for the next not repetitive date

OK. You want this one ?

``````data have;
data have;
input id \$ key \$ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;run;
data a b;
set have;
if id='A' then output a;
else output b;
run;
data want;
if _n_=1 then do;
if 0 then set b(rename=(date=n_dtstamp));
declare hash h(dataset:'b(rename=(date=n_dtstamp))',multidata:'y');
h.definekey('key');
h.definedata('n_dtstamp');
h.definedone();
end;
set a;
max=999999999;
rc=h.find();
do while(rc=0);
if date<n_dtstamp and (n_dtstamp-date)<max then do; want=n_dtstamp;max=(n_dtstamp-date);end;
rc=h.find_next();
end;

if want then do;
rc=h.find();
do while(rc=0);
if want=n_dtstamp then h.removedup();
rc=h.find_next();
end;
end;
drop n_dtstamp rc max;
format want datetime.;
run;
``````

Obsidian | Level 7

## Re: searching for the next not repetitive date

Ksharp, thank you very much!👍 It works!

Discussion stats
• 14 replies
• 693 views
• 3 likes
• 5 in conversation