Calcite | Level 5

## Return the minimum days between two dates on two data sets by ID

I want to be able to return the minimum days between two dates based on two data sets, by ID number.  For instance suppose the two data sets contain the following:

Data Set 1:                                                                                Data Set 2:

ID                  Dt1                                                                       ID                           Dt2

1                   3Mar2017                                                             1                             3Mar2017

1                   5Mar2017                                                             2                             5Apr2017

2                   12Mar2017                                                           2                             20Apr2017

3                   1Apr2017                                                              3                            22Apr2017

3                   4Apr2017                                                              3                            1Apr2017

3                   20Apr2017                                                            4                            2Mar2017

5                   5May2017                                                             6                            3Jun2017

The process will need to return the information to Data set 1. The data will cycle through the first data set comparing the IDs.  If the ID in data set 1 is the same as the ID in data set 2, then take the difference of the Dates.  Repeat this process until the ID in data set 2 does not match the ID in data set 1.  However, for each iteration, keep the minimum date and days between.

Data Set 1 Return Info:

ID                   Dt1                          Dt2                            Days_Between

1                     3Mar2017              3Mar2017                  0

1                     5Mar2017              3Mar2017                  2   *

2                     12Mar2017            5Apr2017                   24

3                     1Apr2017               1Apr2017                  0

3                     4Apr2017               1Apr2017                  3

3                     20Apr2017             22Apr2017                2

4                     .                              2Mar2017                 .     **

5                     5May2017              .                                 .    **

Note:

* This this can be negative

** If the ID is not present in either data return  a NULL

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Return the minimum days between two dates on two data sets by ID

Something like this?

What do you want to do about ties?  Say there are two DT2 values one 2 days before and other 2 days after DT1?

``````proc sql noprint ;
create table want as
select coalesce(a.id,b.id) as id,a.dt1,b.dt2
, case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
from one a
full join two b
on a.id = b.id
group by 1,2
having calculated distance = min(calculated distance)
order by 1,2
;
quit;``````
```Obs    id           dt1           dt2    distance

1      1    2017-03-03    2017-03-03        0
2      1    2017-03-05    2017-03-03        2
3      2    2017-03-12    2017-04-05       24
4      3    2017-04-01    2017-04-01        0
5      3    2017-04-04    2017-04-01        3
6      3    2017-04-20    2017-04-22        2
7      4             .    2017-03-02        .
8      5    2017-05-05             .        .
9      6             .    2017-06-03        .
```
4 REPLIES 4
Tourmaline | Level 20

## Re: Return the minimum days between two dates on two data sets by ID

I can think of two approaches:

1. Hashes- load one dataset into a hash table, look up the values, loop through the hash table to find the closest dates and calculate the min

2.. proc sql-- A join, and carefully calculate. Cartesian, then a group by, then fliter seems relatively simpler but the cartesian in itself hurts my head.

Pyrite | Level 9

## Re: Return the minimum days between two dates on two data sets by ID

Not exactly your results but on the right track.

data dt1;
input id sdate :date8.; format sdate date8.;
cards;
1 3mar2017
1 5Mar2017
2 12Mar2017
3 1Apr2017
3 4Apr2017
3 20Apr2017
5 5May2017

;
data dt2;
input id edate : date8.; format edate date8.;
cards;
1 3Mar2017
2 20Apr2017
3 22Apr2017
3 1Apr2017
4 2Mar2017
6 3Jun2017
;

data want; merge dt1 (in=in1) dt2 (in=in2); by id;

days=intck('day',sdate,edate);
output;

proc print; run;

It needs some logic to handle the case of more end dates then start dates for an id.      jim

Super User

## Re: Return the minimum days between two dates on two data sets by ID

Something like this?

What do you want to do about ties?  Say there are two DT2 values one 2 days before and other 2 days after DT1?

``````proc sql noprint ;
create table want as
select coalesce(a.id,b.id) as id,a.dt1,b.dt2
, case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
from one a
full join two b
on a.id = b.id
group by 1,2
having calculated distance = min(calculated distance)
order by 1,2
;
quit;``````
```Obs    id           dt1           dt2    distance

1      1    2017-03-03    2017-03-03        0
2      1    2017-03-05    2017-03-03        2
3      2    2017-03-12    2017-04-05       24
4      3    2017-04-01    2017-04-01        0
5      3    2017-04-04    2017-04-01        3
6      3    2017-04-20    2017-04-22        2
7      4             .    2017-03-02        .
8      5    2017-05-05             .        .
9      6             .    2017-06-03        .
```
Calcite | Level 5

## Re: Return the minimum days between two dates on two data sets by ID

Good point about the ties.  It would probably be best just to do a count.

Discussion stats
• 4 replies
• 1313 views
• 0 likes
• 4 in conversation