I want to be able to return the minimum days between two dates based on two data sets, by ID number. For instance suppose the two data sets contain the following:
Data Set 1: Data Set 2:
ID Dt1 ID Dt2
1 3Mar2017 1 3Mar2017
1 5Mar2017 2 5Apr2017
2 12Mar2017 2 20Apr2017
3 1Apr2017 3 22Apr2017
3 4Apr2017 3 1Apr2017
3 20Apr2017 4 2Mar2017
5 5May2017 6 3Jun2017
The process will need to return the information to Data set 1. The data will cycle through the first data set comparing the IDs. If the ID in data set 1 is the same as the ID in data set 2, then take the difference of the Dates. Repeat this process until the ID in data set 2 does not match the ID in data set 1. However, for each iteration, keep the minimum date and days between.
Data Set 1 Return Info:
ID Dt1 Dt2 Days_Between
1 3Mar2017 3Mar2017 0
1 5Mar2017 3Mar2017 2 *
2 12Mar2017 5Apr2017 24
3 1Apr2017 1Apr2017 0
3 4Apr2017 1Apr2017 3
3 20Apr2017 22Apr2017 2
4 . 2Mar2017 . **
5 5May2017 . . **
Note:
* This this can be negative
** If the ID is not present in either data return a NULL
Something like this?
What do you want to do about ties? Say there are two DT2 values one 2 days before and other 2 days after DT1?
proc sql noprint ;
create table want as
select coalesce(a.id,b.id) as id,a.dt1,b.dt2
, case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
from one a
full join two b
on a.id = b.id
group by 1,2
having calculated distance = min(calculated distance)
order by 1,2
;
quit;
Obs id dt1 dt2 distance 1 1 2017-03-03 2017-03-03 0 2 1 2017-03-05 2017-03-03 2 3 2 2017-03-12 2017-04-05 24 4 3 2017-04-01 2017-04-01 0 5 3 2017-04-04 2017-04-01 3 6 3 2017-04-20 2017-04-22 2 7 4 . 2017-03-02 . 8 5 2017-05-05 . . 9 6 . 2017-06-03 .
I can think of two approaches:
1. Hashes- load one dataset into a hash table, look up the values, loop through the hash table to find the closest dates and calculate the min
2.. proc sql-- A join, and carefully calculate. Cartesian, then a group by, then fliter seems relatively simpler but the cartesian in itself hurts my head.
Not exactly your results but on the right track.
data dt1;
input id sdate :date8.; format sdate date8.;
cards;
1 3mar2017
1 5Mar2017
2 12Mar2017
3 1Apr2017
3 4Apr2017
3 20Apr2017
5 5May2017
;
data dt2;
input id edate : date8.; format edate date8.;
cards;
1 3Mar2017
2 20Apr2017
3 22Apr2017
3 1Apr2017
4 2Mar2017
6 3Jun2017
;
data want; merge dt1 (in=in1) dt2 (in=in2); by id;
days=intck('day',sdate,edate);
output;
proc print; run;
It needs some logic to handle the case of more end dates then start dates for an id. jim
Something like this?
What do you want to do about ties? Say there are two DT2 values one 2 days before and other 2 days after DT1?
proc sql noprint ;
create table want as
select coalesce(a.id,b.id) as id,a.dt1,b.dt2
, case when not missing(a.dt1) and not missing(b.dt2) then abs(a.dt1-b.dt2) end as distance
from one a
full join two b
on a.id = b.id
group by 1,2
having calculated distance = min(calculated distance)
order by 1,2
;
quit;
Obs id dt1 dt2 distance 1 1 2017-03-03 2017-03-03 0 2 1 2017-03-05 2017-03-03 2 3 2 2017-03-12 2017-04-05 24 4 3 2017-04-01 2017-04-01 0 5 3 2017-04-04 2017-04-01 3 6 3 2017-04-20 2017-04-22 2 7 4 . 2017-03-02 . 8 5 2017-05-05 . . 9 6 . 2017-06-03 .
Good point about the ties. It would probably be best just to do a count.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.