BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
J_J_J
Obsidian | Level 7

Hi guys!

Help me please to solve the problem. I try to find score. ID=A gives 1 point, ID=B minus 1 point, but it should be in chronological order, like in attached file. How is it possible to solve it? The result for code 111 should be 1 point.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. You want this one ?

 

data have;
data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;run;
data a b;
 set have;
 if id='A' then output a;
  else output b;
run;
data want;
 if _n_=1 then do;
   if 0 then set b(rename=(date=n_dtstamp));
   declare hash h(dataset:'b(rename=(date=n_dtstamp))',multidata:'y');
   h.definekey('key');
   h.definedata('n_dtstamp');
   h.definedone();
 end;
set a;
max=999999999;
rc=h.find();
do while(rc=0); 
  if date<n_dtstamp and (n_dtstamp-date)<max then do; want=n_dtstamp;max=(n_dtstamp-date);end;
  rc=h.find_next();
end; 

if want then do;
rc=h.find();
do while(rc=0); 
  if want=n_dtstamp then h.removedup();
  rc=h.find_next();
end; 
end;
drop n_dtstamp rc max;
format want datetime.;
run;

 

 

View solution in original post

14 REPLIES 14
andreas_lds
Jade | Level 19

Please post the data you have in usable form (a datastep using datalines, nothing else) and show the expected result.

J_J_J
Obsidian | Level 7

Data:

data want;1
input id evnt_d;2
datalines; 3
A    05Mar2020 13:02:00
B    06Mar2020 13:45:11
B    08Mar2020 13:15:42
B    14Mar2020 12:36:22
A    16Mar2020 14:26:17
A    19Mar2020 14:01:21
A    20Mar2020 18:01:21
B    23Mar2020 8:56:01
B    27Mar2020 8:56:40
A    28Mar2020 10:42:16
B    29Mar2020 10:42:16

; 3

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

Kurt_Bremser
Super User

Please fix your data step, so that it runs without ERROR messages and creates the dataset you have.

Log from your code:

 74         data want;
 75         input id evnt_d;
 76         datalines;
 
 NOTE: Invalid data for id in line 77 1-1.
 NOTE: Invalid data for evnt_d in line 77 6-14.
 REGEL:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
 77         A    05Mar2020 13:02:00
 id=. evnt_d=. _ERROR_=1 _N_=1
 NOTE: Invalid data for id in line 78 1-1.
 NOTE: Invalid data for evnt_d in line 78 6-14.
 78         B    06Mar2020 13:45:11
 id=. evnt_d=. _ERROR_=1 _N_=2
 NOTE: Invalid data for id in line 79 1-1.
 NOTE: Invalid data for evnt_d in line 79 6-14.
 79         B    08Mar2020 13:15:42
 id=. evnt_d=. _ERROR_=1 _N_=3
 NOTE: Invalid data for id in line 80 1-1.
 NOTE: Invalid data for evnt_d in line 80 6-14.
 80         B    14Mar2020 12:36:22
 id=. evnt_d=. _ERROR_=1 _N_=4
 NOTE: Invalid data for id in line 81 1-1.
 NOTE: Invalid data for evnt_d in line 81 6-14.
 81         A    16Mar2020 14:26:17
 id=. evnt_d=. _ERROR_=1 _N_=5
 NOTE: Invalid data for id in line 82 1-1.
 NOTE: Invalid data for evnt_d in line 82 6-14.
 82         A    19Mar2020 14:01:21
 id=. evnt_d=. _ERROR_=1 _N_=6
 NOTE: Invalid data for id in line 83 1-1.
 NOTE: Invalid data for evnt_d in line 83 6-14.
 83         A    20Mar2020 18:01:21
 id=. evnt_d=. _ERROR_=1 _N_=7
 NOTE: Invalid data for id in line 84 1-1.
 NOTE: Invalid data for evnt_d in line 84 6-14.
 84         B    23Mar2020 8:56:01
 id=. evnt_d=. _ERROR_=1 _N_=8
 NOTE: Invalid data for id in line 85 1-1.
 NOTE: Invalid data for evnt_d in line 85 6-14.
 85         B    27Mar2020 8:56:40
 id=. evnt_d=. _ERROR_=1 _N_=9
 NOTE: Invalid data for id in line 86 1-1.
 NOTE: Invalid data for evnt_d in line 86 6-14.
 86         A    28Mar2020 10:42:16
 id=. evnt_d=. _ERROR_=1 _N_=10
 NOTE: Invalid data for id in line 87 1-1.
 NOTE: Invalid data for evnt_d in line 87 6-14.
 87         B    29Mar2020 10:42:16
 id=. evnt_d=. _ERROR_=1 _N_=11
J_J_J
Obsidian | Level 7

data want;
input id$ evnt_d$21.;
cards;
A 05Mar2020 13:02:00
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
A 16Mar2020 14:26:17
A 19Mar2020 14:01:21
A 20Mar2020 18:01:21
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
A 28Mar2020 10:42:16
B 29Mar2020 10:42:16
;
run;

 

 

 

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

andreas_lds
Jade | Level 19

@J_J_J wrote:

data want;
input id$ evnt_d$21.;
cards;
A 05Mar2020 13:02:00
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
A 16Mar2020 14:26:17
A 19Mar2020 14:01:21
A 20Mar2020 18:01:21
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
A 28Mar2020 10:42:16
B 29Mar2020 10:42:16
;
run;

 

 

 

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16


Sorry, but i can't see how the logic you have described in the starting message could create the dataset you want.

J_J_J
Obsidian | Level 7

I'll try to explain other way. How to select next latest than group_id date from table B for each group_id from table A, provided that this date wasn't used for the previous group? Group_id and evnt_d  are in chronological order.

 

data A;
input group_id$21.;
cards;
05Mar2020 13:02:00
16Mar2020 14:26:17
19Mar2020 14:01:21
20Mar2020 18:01:21
28Mar2020 10:42:16
;
run;

 

data B;
input evnt_d$21.;
cards;
B 06Mar2020 13:45:11
B 08Mar2020 13:15:42
B 14Mar2020 12:36:22
B 23Mar2020 8:56:01
B 27Mar2020 8:56:40
B 29Mar2020 10:42:16
;
run;

 

Expected result:

A    05Mar2020 13:02:00    06Mar2020 13:45:11
A    16Mar2020 14:26:17    23Mar2020 8:56:01
A    19Mar2020 14:01:21    27Mar2020 8:56:40
A    20Mar2020 18:01:21    29Mar2020 10:42:16
A    28Mar2020 10:42:16

mkeintz
PROC Star

For each A record, you want the closest following B record that has not already been paired with an A record:

 

You can do this be using two SET statements:

  1. A SET just for A's
  2. A SET just for B's that read records until the B datetime stamp is greater than the A datetime stamp.  Since the SET statement will read no B record more than once, you won't get the same B assigned to multiple A's:
data have;
input id $ dtstamp datetime18.;
format dtstamp datetime18. ;
cards;
A 05Mar2020:13:02:00
B 06Mar2020:13:45:11
B 08Mar2020:13:15:42
B 14Mar2020:12:36:22
A 16Mar2020:14:26:17
A 19Mar2020:14:01:21
A 20Mar2020:18:01:21
B 23Mar2020:8:56:01
B 27Mar2020:8:56:40
A 28Mar2020:10:42:16
B 29Mar2020:10:42:16
run;

data want (drop=_:);
  set have (where=(id='A') rename=(dtstamp=dtstampa));
  do until (dtstampb>dtstampa or end_of_b);
    if end_of_b=0 then set have (where=(_id='B') rename=(id=_id dtstamp=dtstampb)) end=end_of_b;
    else dtstampb=.;
  end;
run;

Note the SET statement for _ID='B' uses an "if end_of_b=0" test to make sure that the data step does not prematurely stop because all the B records have been exhausted. 

 

As to the "rename" parameters.

  1. You have to give distinct names to the datetime stamps for A records and B records, since they must co-exist in the same output record
  2. You have to rename ID to _ID for reading B records.  Otherwise the output records will have the right datetime stamps but will all have ID=B (not A).

 

Edited note:  This program assumes the data are sorted by DTSTAMP.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
J_J_J
Obsidian | Level 7

mkeintz, thank you very much!

Ksharp
Super User

I like this question.

 

data have;
input id $ dtstamp datetime18.;
format dtstamp datetime18. ;
cards;
A 05Mar2020:13:02:00
B 06Mar2020:13:45:11
B 08Mar2020:13:15:42
B 14Mar2020:12:36:22
A 16Mar2020:14:26:17
A 19Mar2020:14:01:21
A 20Mar2020:18:01:21
B 23Mar2020:8:56:01
B 27Mar2020:8:56:40
A 28Mar2020:10:42:16
B 29Mar2020:10:42:16
;
run;
data a b;
 set have;
 if id='A' then output a;
  else output b;
run;
data want;
 if _n_=1 then do;
   if 0 then set b(rename=(dtstamp=n_dtstamp));
   declare hash h(dataset:'b(rename=(dtstamp=n_dtstamp))',ordered:'a');
   declare hiter hi('h');
   h.definekey('n_dtstamp');
   h.definedata('n_dtstamp');
   h.definedone();
 end;
set a;
rc=hi.first();
do while(rc=0); 
  if dtstamp<n_dtstamp then do;want=n_dtstamp; leave;end;
  rc=hi.next();
end; 
if want then do;
rc=hi.next();
h.remove(key:want);
end;
drop n_dtstamp rc ;
format want datetime.;
run;
J_J_J
Obsidian | Level 7

Could you help me to do it with a key please?

 

data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;


ID key date
A 4129 21Jan2021 15:31:53
A 5951 21Jan2021 15:33:13
A 1093 27Jan2021 13:04:08
A 5794 04Feb2021 16:25:07
A 6095 04Feb2021 16:26:41
B 5794 12Feb2021 14:55:57
B 6095 15Feb2021 8:19:39
A 1031 15Feb2021 14:08:23
A 2166 26Feb2021 11:44:59
;
run;

Expected result:
key    datea                           dateb
4129 21Jan2021 15:31:53
5951 21Jan2021 15:33:13

1093  27Jan2021 13:04:08
5794  04Feb2021 16:25:07 12Feb2021 14:55:57
6095  04Feb2021 16:26:41 15Feb2021 8:19:39
1031  15Feb2021 14:08:23
2166  26Feb2021 11:44:59

Ksharp
Super User

So you just simply merge it by KEY ?

 

 

data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;
A 4129 21Jan2021 15:31:53
A 5951 21Jan2021 15:33:13
A 1093 27Jan2021 13:04:08
A 5794 04Feb2021 16:25:07
A 6095 04Feb2021 16:26:41
B 5794 12Feb2021 14:55:57
B 6095 15Feb2021 8:19:39
A 1031 15Feb2021 14:08:23
A 2166 26Feb2021 11:44:59
;
run;

data a b;
 set have;
 if id='A' then output a;
  else output b;
run;
data want;
 if _n_=1 then do;
   if 0 then set b(rename=(date=n_dtstamp));
   declare hash h(dataset:'b(rename=(date=n_dtstamp))');
   h.definekey('key');
   h.definedata('n_dtstamp');
   h.definedone();
 end;
set a;
call missing(n_dtstamp);
rc=h.find();
drop rc;
run;
J_J_J
Obsidian | Level 7

Ksharp, something goes wrong with this code.

data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;
run;

data a b;
set have;
if id='A' then output a;
else output b;
run;
data want;
if _n_=1 then do;
if 0 then set b(rename=(date=n_dtstamp));
declare hash h(dataset:'b(rename=(date=n_dtstamp))');
h.definekey('key');
h.definedata('n_dtstamp');
h.definedone();
end;
set a;
call missing(n_dtstamp);
rc=h.find();
drop rc;
run;

 

Expected result is:

id    key              date                                n_dtstamp
A    11111111      31Oct2019 10:50:42
A    22222222    31Oct2019 10:51:17      30Mar2020 10:49:35
A    22222222    02Mar2020 15:12:33
B    33333333   16Mar2020 12:23:46
A    44444444    25Mar2020 13:02:00     01Oct2020 8:56:40
A    55555555    27Mar2020 13:20:07
A    66666666    27Mar2020 13:35:35     20Nov2020 14:26:29
A    33333333    30Apr2020 16:09:14     17Nov2020 9:31:58
A    22222222    12Jun2020 15:32:04
A    66666666    12Jun2020 16:42:03
A    33333333    19Aug2020 9:16:55
A    33333333    18Sep2020 9:40:37
A    11111111     12Oct2020 12:32:40
A    11111111     09Nov2020 14:19:16
A    66666666    26Feb2021 11:44:59

 

But I get:

id    key           n_dtstamp                    date
A    11111111  31Oct2019 10:50:42
A    22222222 30Mar2020 10:49:35    31Oct2019 10:51:17
A    22222222 30Mar2020 10:49:35    02Mar2020 15:12:33
A    44444444 01Oct2020 8:56:40      25Mar2020 13:02:00
A    55555555 27Mar2020 13:20:07
A    66666666 20Nov2020 14:26:29    27Mar2020 13:35:35
A    33333333 16Mar2020 12:23:46    30Apr2020 16:09:14
A    22222222 30Mar2020 10:49:35    12Jun2020 15:32:04
A    66666666 20Nov2020 14:26:29    12Jun2020 16:42:03
A    33333333 16Mar2020 12:23:46    19Aug2020 9:16:55
A    33333333 16Mar2020 12:23:46    18Sep2020 9:40:37
A    11111111 12Oct2020 12:32:40
A    11111111 09Nov2020 14:19:16
A    66666666 20Nov2020 14:26:29    26Feb2021 11:44:59

Ksharp
Super User

OK. You want this one ?

 

data have;
data have;
input id $ key $ date datetime18.;
format date datetime18. ;
cards;
A 11111111 31Oct2019 10:50:42
A 22222222 31Oct2019 10:51:17
A 22222222 02Mar2020 15:12:33
B 33333333 16Mar2020 12:23:46
A 44444444 25Mar2020 13:02:00
A 55555555 27Mar2020 13:20:07
A 66666666 27Mar2020 13:35:35
B 22222222 30Mar2020 10:49:35
A 33333333 30Apr2020 16:09:14
A 22222222 12Jun2020 15:32:04
A 66666666 12Jun2020 16:42:03
A 33333333 19Aug2020 9:16:55
A 33333333 18Sep2020 9:40:37
B 44444444 01Oct2020 8:56:40
A 11111111 12Oct2020 12:32:40
A 11111111 09Nov2020 14:19:16
B 33333333 17Nov2020 9:31:58
B 66666666 20Nov2020 14:26:29
A 66666666 26Feb2021 11:44:59
;run;
data a b;
 set have;
 if id='A' then output a;
  else output b;
run;
data want;
 if _n_=1 then do;
   if 0 then set b(rename=(date=n_dtstamp));
   declare hash h(dataset:'b(rename=(date=n_dtstamp))',multidata:'y');
   h.definekey('key');
   h.definedata('n_dtstamp');
   h.definedone();
 end;
set a;
max=999999999;
rc=h.find();
do while(rc=0); 
  if date<n_dtstamp and (n_dtstamp-date)<max then do; want=n_dtstamp;max=(n_dtstamp-date);end;
  rc=h.find_next();
end; 

if want then do;
rc=h.find();
do while(rc=0); 
  if want=n_dtstamp then h.removedup();
  rc=h.find_next();
end; 
end;
drop n_dtstamp rc max;
format want datetime.;
run;

 

 

J_J_J
Obsidian | Level 7

 Ksharp, thank you very much!👍 It works!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1186 views
  • 3 likes
  • 5 in conversation