BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
J_J_J
Obsidian | Level 7

Hi guys!

Help me please to solve the problem. I try to find score. ID=A gives 1 point, ID=B minus 1 point, but it should be in chronological order, like in attached file. How is it possible to solve it? The result for code 111 should be 1 point.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. You want this one ?

 

data have;
data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;run;
data a b;
 set have;
 if id='A' then output a;
  else output b;
run;
data want;
 if _n_=1 then do;
   if 0 then set b(rename=(date=n_dtstamp));
   declare hash h(dataset:'b(rename=(date=n_dtstamp))',multidata:'y');
   h.definekey('key');
   h.definedata('n_dtstamp');
   h.definedone();
 end;
set a;
max=999999999;
rc=h.find();
do while(rc=0); 
  if date<n_dtstamp and (n_dtstamp-date)<max then do; want=n_dtstamp;max=(n_dtstamp-date);end;
  rc=h.find_next();
end; 

if want then do;
rc=h.find();
do while(rc=0); 
  if want=n_dtstamp then h.removedup();
  rc=h.find_next();
end; 
end;
drop n_dtstamp rc max;
format want datetime.;
run;

 

 

View solution in original post

14 REPLIES 14
andreas_lds
Jade | Level 19

Please post the data you have in usable form (a datastep using datalines, nothing else) and show the expected result.

J_J_J
Obsidian | Level 7

Data:

data want;1
input id evnt_d;2
datalines; 3
A    05Mar2020 13:02:00
B    06Mar2020 13:45:11
B    08Mar2020 13:15:42
B    14Mar2020 12:36:22
A    16Mar2020 14:26:17
A    19Mar2020 14:01:21
A    20Mar2020 18:01:21
B    23Mar2020 8:56:01
B    27Mar2020 8:56:40
A    28Mar2020 10:42:16
B    29Mar2020 10:42:16

; 3

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

Kurt_Bremser
Super User

Please fix your data step, so that it runs without ERROR messages and creates the dataset you have.

Log from your code:

 74         data want;
 75         input id evnt_d;
 76         datalines;
 
 NOTE: Invalid data for id in line 77 1-1.
 NOTE: Invalid data for evnt_d in line 77 6-14.
 REGEL:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
 77         A    05Mar2020 13:02:00
 id=. evnt_d=. _ERROR_=1 _N_=1
 NOTE: Invalid data for id in line 78 1-1.
 NOTE: Invalid data for evnt_d in line 78 6-14.
 78         B    06Mar2020 13:45:11
 id=. evnt_d=. _ERROR_=1 _N_=2
 NOTE: Invalid data for id in line 79 1-1.
 NOTE: Invalid data for evnt_d in line 79 6-14.
 79         B    08Mar2020 13:15:42
 id=. evnt_d=. _ERROR_=1 _N_=3
 NOTE: Invalid data for id in line 80 1-1.
 NOTE: Invalid data for evnt_d in line 80 6-14.
 80         B    14Mar2020 12:36:22
 id=. evnt_d=. _ERROR_=1 _N_=4
 NOTE: Invalid data for id in line 81 1-1.
 NOTE: Invalid data for evnt_d in line 81 6-14.
 81         A    16Mar2020 14:26:17
 id=. evnt_d=. _ERROR_=1 _N_=5
 NOTE: Invalid data for id in line 82 1-1.
 NOTE: Invalid data for evnt_d in line 82 6-14.
 82         A    19Mar2020 14:01:21
 id=. evnt_d=. _ERROR_=1 _N_=6
 NOTE: Invalid data for id in line 83 1-1.
 NOTE: Invalid data for evnt_d in line 83 6-14.
 83         A    20Mar2020 18:01:21
 id=. evnt_d=. _ERROR_=1 _N_=7
 NOTE: Invalid data for id in line 84 1-1.
 NOTE: Invalid data for evnt_d in line 84 6-14.
 84         B    23Mar2020 8:56:01
 id=. evnt_d=. _ERROR_=1 _N_=8
 NOTE: Invalid data for id in line 85 1-1.
 NOTE: Invalid data for evnt_d in line 85 6-14.
 85         B    27Mar2020 8:56:40
 id=. evnt_d=. _ERROR_=1 _N_=9
 NOTE: Invalid data for id in line 86 1-1.
 NOTE: Invalid data for evnt_d in line 86 6-14.
 86         A    28Mar2020 10:42:16
 id=. evnt_d=. _ERROR_=1 _N_=10
 NOTE: Invalid data for id in line 87 1-1.
 NOTE: Invalid data for evnt_d in line 87 6-14.
 87         B    29Mar2020 10:42:16
 id=. evnt_d=. _ERROR_=1 _N_=11
J_J_J
Obsidian | Level 7

data want;
input id$ evnt_d$21.;
cards;
A 05Mar2020 13:02:00
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
A 16Mar2020 14:26:17
A 19Mar2020 14:01:21
A 20Mar2020 18:01:21
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
A 28Mar2020 10:42:16
B 29Mar2020 10:42:16
;
run;

 

 

 

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

andreas_lds
Jade | Level 19

@J_J_J wrote:

data want;
input id$ evnt_d$21.;
cards;
A 05Mar2020 13:02:00
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
A 16Mar2020 14:26:17
A 19Mar2020 14:01:21
A 20Mar2020 18:01:21
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
A 28Mar2020 10:42:16
B 29Mar2020 10:42:16
;
run;

 

 

 

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16


Sorry, but i can't see how the logic you have described in the starting message could create the dataset you want.

J_J_J
Obsidian | Level 7

I'll try to explain other way. How to select next latest than group_id date from table B for each group_id from table A, provided that this date wasn't used for the previous group? Group_id and evnt_d  are in chronological order.

 

data A;
input group_id$21.;
cards;
05Mar2020 13:02:00
16Mar2020 14:26:17
19Mar2020 14:01:21
20Mar2020 18:01:21
28Mar2020 10:42:16
;
run;

 

data B;
input evnt_d$21.;
cards;
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
B 29Mar2020 10:42:16
;
run;

 

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

mkeintz
PROC Star

For each A record, you want the closest following B record that has not already been paired with an A record:

 

You can do this be using two SET statements:

  1. A SET just for A's
  2. A SET just for B's that read records until the B datetime stamp is greater than the A datetime stamp.  Since the SET statement will read no B record more than once, you won't get the same B assigned to multiple A's:
data have;
input id $ dtstamp datetime18.;
format dtstamp datetime18. ;
cards;
A 05Mar2020:13:02:00
B 06Mar2020:13:45:11
B 08Mar2020:13:15:42
B 14Mar2020:12:36:22
A 16Mar2020:14:26:17
A 19Mar2020:14:01:21
A 20Mar2020:18:01:21
B 23Mar2020:8:56:01
B 27Mar2020:8:56:40
A 28Mar2020:10:42:16
B 29Mar2020:10:42:16
run;

data want (drop=_:);
  set have (where=(id='A') rename=(dtstamp=dtstampa));
  do until (dtstampb>dtstampa or end_of_b);
    if end_of_b=0 then set have (where=(_id='B') rename=(id=_id dtstamp=dtstampb)) end=end_of_b;
    else dtstampb=.;
  end;
run;

Note the SET statement for _ID='B' uses an "if end_of_b=0" test to make sure that the data step does not prematurely stop because all the B records have been exhausted. 

 

As to the "rename" parameters.

  1. You have to give distinct names to the datetime stamps for A records and B records, since they must co-exist in the same output record
  2. You have to rename ID to _ID for reading B records.  Otherwise the output records will have the right datetime stamps but will all have ID=B (not A).

 

Edited note:  This program assumes the data are sorted by DTSTAMP.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
J_J_J
Obsidian | Level 7

mkeintz, thank you very much!

Ksharp
Super User

I like this question.

 

data have;
input id $ dtstamp datetime18.;
format dtstamp datetime18. ;
cards;
A 05Mar2020:13:02:00
B 06Mar2020:13:45:11
B 08Mar2020:13:15:42
B 14Mar2020:12:36:22
A 16Mar2020:14:26:17
A 19Mar2020:14:01:21
A 20Mar2020:18:01:21
B 23Mar2020:8:56:01
B 27Mar2020:8:56:40
A 28Mar2020:10:42:16
B 29Mar2020:10:42:16
;
run;
data a b;
 set have;
 if id='A' then output a;
  else output b;
run;
data want;
 if _n_=1 then do;
   if 0 then set b(rename=(dtstamp=n_dtstamp));
   declare hash h(dataset:'b(rename=(dtstamp=n_dtstamp))',ordered:'a');
   declare hiter hi('h');
   h.definekey('n_dtstamp');
   h.definedata('n_dtstamp');
   h.definedone();
 end;
set a;
rc=hi.first();
do while(rc=0); 
  if dtstamp<n_dtstamp then do;want=n_dtstamp; leave;end;
  rc=hi.next();
end; 
if want then do;
rc=hi.next();
h.remove(key:want);
end;
drop n_dtstamp rc ;
format want datetime.;
run;
J_J_J
Obsidian | Level 7

Could you help me to do it with a key please?

 

data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;


ID key date
A 4129 21Jan2021 15:31:53
A 5951 21Jan2021 15:33:13
A 1093 27Jan2021 13:04:08
A 5794 04Feb2021 16:25:07
A 6095 04Feb2021 16:26:41
B 5794 12Feb2021 14:55:57
B 6095 15Feb2021 8:19:39
A 1031 15Feb2021 14:08:23
A 2166 26Feb2021 11:44:59
;
run;

Expected result:
key    datea                           dateb
4129 21Jan2021 15:31:53
5951 21Jan2021 15:33:13

1093  27Jan2021 13:04:08
5794  04Feb2021 16:25:07 12Feb2021 14:55:57
6095  04Feb2021 16:26:41 15Feb2021 8:19:39
1031  15Feb2021 14:08:23
2166  26Feb2021 11:44:59

Ksharp
Super User

So you just simply merge it by KEY ?

 

 

data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;
A 4129 21Jan2021 15:31:53
A 5951 21Jan2021 15:33:13
A 1093 27Jan2021 13:04:08
A 5794 04Feb2021 16:25:07
A 6095 04Feb2021 16:26:41
B 5794 12Feb2021 14:55:57
B 6095 15Feb2021 8:19:39
A 1031 15Feb2021 14:08:23
A 2166 26Feb2021 11:44:59
;
run;

data a b;
 set have;
 if id='A' then output a;
  else output b;
run;
data want;
 if _n_=1 then do;
   if 0 then set b(rename=(date=n_dtstamp));
   declare hash h(dataset:'b(rename=(date=n_dtstamp))');
   h.definekey('key');
   h.definedata('n_dtstamp');
   h.definedone();
 end;
set a;
call missing(n_dtstamp);
rc=h.find();
drop rc;
run;
J_J_J
Obsidian | Level 7

Ksharp, something goes wrong with this code.

data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;
run;

data a b;
set have;
if id='A' then output a;
else output b;
run;
data want;
if _n_=1 then do;
if 0 then set b(rename=(date=n_dtstamp));
declare hash h(dataset:'b(rename=(date=n_dtstamp))');
h.definekey('key');
h.definedata('n_dtstamp');
h.definedone();
end;
set a;
call missing(n_dtstamp);
rc=h.find();
drop rc;
run;

 

Expected result is:

id    key              date                                n_dtstamp
A    11111111      31Oct2019 10:50:42
A    22222222    31Oct2019 10:51:17      30Mar2020 10:49:35
A    22222222    02Mar2020 15:12:33
B    33333333   16Mar2020 12:23:46
A    44444444    25Mar2020 13:02:00     01Oct2020 8:56:40
A    55555555    27Mar2020 13:20:07
A    66666666    27Mar2020 13:35:35     20Nov2020 14:26:29
A    33333333    30Apr2020 16:09:14     17Nov2020 9:31:58
A    22222222    12Jun2020 15:32:04
A    66666666    12Jun2020 16:42:03
A    33333333    19Aug2020 9:16:55
A    33333333    18Sep2020 9:40:37
A    11111111     12Oct2020 12:32:40
A    11111111     09Nov2020 14:19:16
A    66666666    26Feb2021 11:44:59

 

But I get:

id    key           n_dtstamp                    date
A    11111111  31Oct2019 10:50:42
A    22222222 30Mar2020 10:49:35    31Oct2019 10:51:17
A    22222222 30Mar2020 10:49:35    02Mar2020 15:12:33
A    44444444 01Oct2020 8:56:40      25Mar2020 13:02:00
A    55555555 27Mar2020 13:20:07
A    66666666 20Nov2020 14:26:29    27Mar2020 13:35:35
A    33333333 16Mar2020 12:23:46    30Apr2020 16:09:14
A    22222222 30Mar2020 10:49:35    12Jun2020 15:32:04
A    66666666 20Nov2020 14:26:29    12Jun2020 16:42:03
A    33333333 16Mar2020 12:23:46    19Aug2020 9:16:55
A    33333333 16Mar2020 12:23:46    18Sep2020 9:40:37
A    11111111 12Oct2020 12:32:40
A    11111111 09Nov2020 14:19:16
A    66666666 20Nov2020 14:26:29    26Feb2021 11:44:59

Ksharp
Super User

OK. You want this one ?

 

data have;
data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;run;
data a b;
 set have;
 if id='A' then output a;
  else output b;
run;
data want;
 if _n_=1 then do;
   if 0 then set b(rename=(date=n_dtstamp));
   declare hash h(dataset:'b(rename=(date=n_dtstamp))',multidata:'y');
   h.definekey('key');
   h.definedata('n_dtstamp');
   h.definedone();
 end;
set a;
max=999999999;
rc=h.find();
do while(rc=0); 
  if date<n_dtstamp and (n_dtstamp-date)<max then do; want=n_dtstamp;max=(n_dtstamp-date);end;
  rc=h.find_next();
end; 

if want then do;
rc=h.find();
do while(rc=0); 
  if want=n_dtstamp then h.removedup();
  rc=h.find_next();
end; 
end;
drop n_dtstamp rc max;
format want datetime.;
run;

 

 

J_J_J
Obsidian | Level 7

 Ksharp, thank you very much!👍 It works!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 1901 views
  • 3 likes
  • 5 in conversation