DATA Step, Macro, Functions and more

Return the minimum days between two dates on two data sets by ID

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Return the minimum days between two dates on two data sets by ID

[ Edited ]

I want to be able to return the minimum days between two dates based on two data sets, by ID number.  For instance suppose the two data sets contain the following:

 

Data Set 1:                                                                                Data Set 2:

ID                  Dt1                                                                       ID                           Dt2

1                   3Mar2017                                                             1                             3Mar2017

1                   5Mar2017                                                             2                             5Apr2017

2                   12Mar2017                                                           2                             20Apr2017

3                   1Apr2017                                                              3                            22Apr2017

3                   4Apr2017                                                              3                            1Apr2017

3                   20Apr2017                                                            4                            2Mar2017

5                   5May2017                                                             6                            3Jun2017

 

The process will need to return the information to Data set 1. The data will cycle through the first data set comparing the IDs.  If the ID in data set 1 is the same as the ID in data set 2, then take the difference of the Dates.  Repeat this process until the ID in data set 2 does not match the ID in data set 1.  However, for each iteration, keep the minimum date and days between.

 

Data Set 1 Return Info:

ID                   Dt1                          Dt2                            Days_Between

1                     3Mar2017              3Mar2017                  0

1                     5Mar2017              3Mar2017                  2   *

2                     12Mar2017            5Apr2017                   24

3                     1Apr2017               1Apr2017                  0

3                     4Apr2017               1Apr2017                  3 

3                     20Apr2017             22Apr2017                2

4                     .                              2Mar2017                 .     **

5                     5May2017              .                                 .    **

                            

Note:

* This this can be negative

** If the ID is not present in either data return  a NULL


Accepted Solutions
Solution
‎09-19-2017 08:24 AM
Super User
Super User
Posts: 7,948

Re: Return the minimum days between two dates on two data sets by ID

Something like this?  

What do you want to do about ties?  Say there are two DT2 values one 2 days before and other 2 days after DT1?

proc sql noprint ;
  create table want as 
  select coalesce(a.id,b.id) as id,a.dt1,b.dt2
       , case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
  from one a 
  full join two b
  on a.id = b.id 
  group by 1,2
  having calculated distance = min(calculated distance)
  order by 1,2
  ;
quit;
Obs    id           dt1           dt2    distance

 1      1    2017-03-03    2017-03-03        0
 2      1    2017-03-05    2017-03-03        2
 3      2    2017-03-12    2017-04-05       24
 4      3    2017-04-01    2017-04-01        0
 5      3    2017-04-04    2017-04-01        3
 6      3    2017-04-20    2017-04-22        2
 7      4             .    2017-03-02        .
 8      5    2017-05-05             .        .
 9      6             .    2017-06-03        .

View solution in original post


All Replies
PROC Star
Posts: 1,605

Re: Return the minimum days between two dates on two data sets by ID

[ Edited ]

I can think of two approaches:

1. Hashes- load one dataset into a hash table, look up the values, loop through the hash table to find the closest dates and calculate the min

2.. proc sql-- A join, and carefully calculate. Cartesian, then a group by, then fliter seems relatively simpler but the cartesian in itself hurts my head.

 

 

Frequent Contributor
Posts: 102

Re: Return the minimum days between two dates on two data sets by ID

[ Edited ]

Not exactly your results but on the right track.

 

data dt1;
input id sdate :date8.; format sdate date8.;
cards;
1 3mar2017
1 5Mar2017
2 12Mar2017
3 1Apr2017
3 4Apr2017
3 20Apr2017
5 5May2017

;
data dt2;
input id edate : date8.; format edate date8.;
cards;
1 3Mar2017
2 20Apr2017
3 22Apr2017
3 1Apr2017
4 2Mar2017
6 3Jun2017
;


data want; merge dt1 (in=in1) dt2 (in=in2); by id;

days=intck('day',sdate,edate);
output;

proc print; run;

 

 

It needs some logic to handle the case of more end dates then start dates for an id.      jim

Solution
‎09-19-2017 08:24 AM
Super User
Super User
Posts: 7,948

Re: Return the minimum days between two dates on two data sets by ID

Something like this?  

What do you want to do about ties?  Say there are two DT2 values one 2 days before and other 2 days after DT1?

proc sql noprint ;
  create table want as 
  select coalesce(a.id,b.id) as id,a.dt1,b.dt2
       , case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
  from one a 
  full join two b
  on a.id = b.id 
  group by 1,2
  having calculated distance = min(calculated distance)
  order by 1,2
  ;
quit;
Obs    id           dt1           dt2    distance

 1      1    2017-03-03    2017-03-03        0
 2      1    2017-03-05    2017-03-03        2
 3      2    2017-03-12    2017-04-05       24
 4      3    2017-04-01    2017-04-01        0
 5      3    2017-04-04    2017-04-01        3
 6      3    2017-04-20    2017-04-22        2
 7      4             .    2017-03-02        .
 8      5    2017-05-05             .        .
 9      6             .    2017-06-03        .
Occasional Contributor
Posts: 7

Re: Return the minimum days between two dates on two data sets by ID

Good point about the ties.  It would probably be best just to do a count.  

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 163 views
  • 0 likes
  • 4 in conversation