Solved: Re: Return the minimum days between two dates on two data sets by ID

Sth19 · Posted 09-18-2017 11:48 AM

I want to be able to return the minimum days between two dates based on two data sets, by ID number. For instance suppose the two data sets contain the following:

Data Set 1: Data Set 2:

ID Dt1 ID Dt2

1 3Mar2017 1 3Mar2017

1 5Mar2017 2 5Apr2017

2 12Mar2017 2 20Apr2017

3 1Apr2017 3 22Apr2017

3 4Apr2017 3 1Apr2017

3 20Apr2017 4 2Mar2017

5 5May2017 6 3Jun2017

The process will need to return the information to Data set 1. The data will cycle through the first data set comparing the IDs. If the ID in data set 1 is the same as the ID in data set 2, then take the difference of the Dates. Repeat this process until the ID in data set 2 does not match the ID in data set 1. However, for each iteration, keep the minimum date and days between.

Data Set 1 Return Info:

ID Dt1 Dt2 Days_Between

1 3Mar2017 3Mar2017 0

1 5Mar2017 3Mar2017 2 *

2 12Mar2017 5Apr2017 24

3 1Apr2017 1Apr2017 0

3 4Apr2017 1Apr2017 3

3 20Apr2017 22Apr2017 2

4 . 2Mar2017 . **

5 5May2017 . . **

Note:

* This this can be negative

** If the ID is not present in either data return a NULL

Tom · Posted 09-18-2017 03:08 PM

Something like this?

What do you want to do about ties? Say there are two DT2 values one 2 days before and other 2 days after DT1?

proc sql noprint ;
  create table want as 
  select coalesce(a.id,b.id) as id,a.dt1,b.dt2
       , case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
  from one a 
  full join two b
  on a.id = b.id 
  group by 1,2
  having calculated distance = min(calculated distance)
  order by 1,2
  ;
quit;

Obs    id           dt1           dt2    distance

 1      1    2017-03-03    2017-03-03        0
 2      1    2017-03-05    2017-03-03        2
 3      2    2017-03-12    2017-04-05       24
 4      3    2017-04-01    2017-04-01        0
 5      3    2017-04-04    2017-04-01        3
 6      3    2017-04-20    2017-04-22        2
 7      4             .    2017-03-02        .
 8      5    2017-05-05             .        .
 9      6             .    2017-06-03        .

View solution in original post

novinosrin · Posted 09-18-2017 01:39 PM

I can think of two approaches:

1. Hashes- load one dataset into a hash table, look up the values, loop through the hash table to find the closest dates and calculate the min

2.. proc sql-- A join, and carefully calculate. Cartesian, then a group by, then fliter seems relatively simpler but the cartesian in itself hurts my head.

Jim_G · Posted 09-18-2017 01:41 PM

Not exactly your results but on the right track.

data dt1;
input id sdate :date8.; format sdate date8.;
cards;
1 3mar2017
1 5Mar2017
2 12Mar2017
3 1Apr2017
3 4Apr2017
3 20Apr2017
5 5May2017

;
data dt2;
input id edate : date8.; format edate date8.;
cards;
1 3Mar2017
2 20Apr2017
3 22Apr2017
3 1Apr2017
4 2Mar2017
6 3Jun2017
;

data want; merge dt1 (in=in1) dt2 (in=in2); by id;

days=intck('day',sdate,edate);
output;

proc print; run;

It needs some logic to handle the case of more end dates then start dates for an id. jim

Tom · Posted 09-18-2017 03:08 PM

Something like this?

What do you want to do about ties? Say there are two DT2 values one 2 days before and other 2 days after DT1?

proc sql noprint ;
  create table want as 
  select coalesce(a.id,b.id) as id,a.dt1,b.dt2
       , case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
  from one a 
  full join two b
  on a.id = b.id 
  group by 1,2
  having calculated distance = min(calculated distance)
  order by 1,2
  ;
quit;

Obs    id           dt1           dt2    distance

 1      1    2017-03-03    2017-03-03        0
 2      1    2017-03-05    2017-03-03        2
 3      2    2017-03-12    2017-04-05       24
 4      3    2017-04-01    2017-04-01        0
 5      3    2017-04-04    2017-04-01        3
 6      3    2017-04-20    2017-04-22        2
 7      4             .    2017-03-02        .
 8      5    2017-05-05             .        .
 9      6             .    2017-06-03        .

Sth19 · Posted 09-18-2017 03:29 PM

Good point about the ties. It would probably be best just to do a count.

Return the minimum days between two dates on two data sets by ID

Re: Return the minimum days between two dates on two data sets by ID

Re: Return the minimum days between two dates on two data sets by ID

Re: Return the minimum days between two dates on two data sets by ID

Re: Return the minimum days between two dates on two data sets by ID

Re: Return the minimum days between two dates on two data sets by ID

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away