BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sth19
Calcite | Level 5

I want to be able to return the minimum days between two dates based on two data sets, by ID number.  For instance suppose the two data sets contain the following:

 

Data Set 1:                                                                                Data Set 2:

ID                  Dt1                                                                       ID                           Dt2

1                   3Mar2017                                                             1                             3Mar2017

1                   5Mar2017                                                             2                             5Apr2017

2                   12Mar2017                                                           2                             20Apr2017

3                   1Apr2017                                                              3                            22Apr2017

3                   4Apr2017                                                              3                            1Apr2017

3                   20Apr2017                                                            4                            2Mar2017

5                   5May2017                                                             6                            3Jun2017

 

The process will need to return the information to Data set 1. The data will cycle through the first data set comparing the IDs.  If the ID in data set 1 is the same as the ID in data set 2, then take the difference of the Dates.  Repeat this process until the ID in data set 2 does not match the ID in data set 1.  However, for each iteration, keep the minimum date and days between.

 

Data Set 1 Return Info:

ID                   Dt1                          Dt2                            Days_Between

1                     3Mar2017              3Mar2017                  0

1                     5Mar2017              3Mar2017                  2   *

2                     12Mar2017            5Apr2017                   24

3                     1Apr2017               1Apr2017                  0

3                     4Apr2017               1Apr2017                  3 

3                     20Apr2017             22Apr2017                2

4                     .                              2Mar2017                 .     **

5                     5May2017              .                                 .    **

                            

Note:

* This this can be negative

** If the ID is not present in either data return  a NULL

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Something like this?  

What do you want to do about ties?  Say there are two DT2 values one 2 days before and other 2 days after DT1?

proc sql noprint ;
  create table want as 
  select coalesce(a.id,b.id) as id,a.dt1,b.dt2
       , case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
  from one a 
  full join two b
  on a.id = b.id 
  group by 1,2
  having calculated distance = min(calculated distance)
  order by 1,2
  ;
quit;
Obs    id           dt1           dt2    distance

 1      1    2017-03-03    2017-03-03        0
 2      1    2017-03-05    2017-03-03        2
 3      2    2017-03-12    2017-04-05       24
 4      3    2017-04-01    2017-04-01        0
 5      3    2017-04-04    2017-04-01        3
 6      3    2017-04-20    2017-04-22        2
 7      4             .    2017-03-02        .
 8      5    2017-05-05             .        .
 9      6             .    2017-06-03        .

View solution in original post

4 REPLIES 4
novinosrin
Tourmaline | Level 20

I can think of two approaches:

1. Hashes- load one dataset into a hash table, look up the values, loop through the hash table to find the closest dates and calculate the min

2.. proc sql-- A join, and carefully calculate. Cartesian, then a group by, then fliter seems relatively simpler but the cartesian in itself hurts my head.

 

 

Jim_G
Pyrite | Level 9

Not exactly your results but on the right track.

 

data dt1;
input id sdate :date8.; format sdate date8.;
cards;
1 3mar2017
1 5Mar2017
2 12Mar2017
3 1Apr2017
3 4Apr2017
3 20Apr2017
5 5May2017

;
data dt2;
input id edate : date8.; format edate date8.;
cards;
1 3Mar2017
2 20Apr2017
3 22Apr2017
3 1Apr2017
4 2Mar2017
6 3Jun2017
;


data want; merge dt1 (in=in1) dt2 (in=in2); by id;

days=intck('day',sdate,edate);
output;

proc print; run;

 

 

It needs some logic to handle the case of more end dates then start dates for an id.      jim

Tom
Super User Tom
Super User

Something like this?  

What do you want to do about ties?  Say there are two DT2 values one 2 days before and other 2 days after DT1?

proc sql noprint ;
  create table want as 
  select coalesce(a.id,b.id) as id,a.dt1,b.dt2
       , case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
  from one a 
  full join two b
  on a.id = b.id 
  group by 1,2
  having calculated distance = min(calculated distance)
  order by 1,2
  ;
quit;
Obs    id           dt1           dt2    distance

 1      1    2017-03-03    2017-03-03        0
 2      1    2017-03-05    2017-03-03        2
 3      2    2017-03-12    2017-04-05       24
 4      3    2017-04-01    2017-04-01        0
 5      3    2017-04-04    2017-04-01        3
 6      3    2017-04-20    2017-04-22        2
 7      4             .    2017-03-02        .
 8      5    2017-05-05             .        .
 9      6             .    2017-06-03        .
Sth19
Calcite | Level 5

Good point about the ties.  It would probably be best just to do a count.  

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1228 views
  • 0 likes
  • 4 in conversation