BookmarkSubscribeRSS Feed
bradbelf262
Calcite | Level 5

I am trying to select unique encounters based on minimal days between encounters. I would like to choose the closest date from table 1 to its respective closest date in table 2 (if exists) to make a bookend date range, if you will.  There may me more ID-Dates in table 1 than table 2 and vice versa. For for instance my data from t1 would be something like:

ID    time1

A     01/09/2015
A     02/16/2015
B     03/03/2015
C     04/01/2015
C     01/20/2015
C     03/15/2015

t2 similarly:

ID     time2
A     01/29/2015
A     02/19/2015
B     03/06/2015
C     01/27/2015
C     03/18/2015
C     04/04/2015

What I don't want when joining on ID is a combination of all unique dates, rather combinations where the days between are minimum eg.

desired_output:

ID    date_time1    date_time2

A     01/09/2015     01/29/2015
A     02/16/2015     02/19/2015
B     03/03/2015     03/06/2015
C     01/20/2015     01/27/2015
C     03/15/2015     03/18/2015
C     04/01/2015     04/04/2015

I am not sure how to do this, I have tried selecting the max and min but that only gives something like, which is not what I want:

ID    date_time1    date_time2
A     01/09/2015     02/19/2015

 B    03/03/2015     03/06/2015
C     01/20/2015     04/04/2015 

Any ideas/suggestions for how to do this would be greatly appreciated! Thank you!

 

5 REPLIES 5
Astounding
PROC Star

Just to get this question out of the way first ... are your variables actually dates or date-times?

 

There is more than one way to combine data sets.  For example:

 

  • Take the earliest t1 record, match it with the earliest t2 record, etc.
  • Re-use some records.  For each t2 record, take the closest t1 record that is earlier.

Is it possible that t1 would contain 3 records for an ID, while t2 would contain only 2 records for the same ID?

 

It will take just a little more definition to the problem.  The programming won't be simple, but won't be terribly difficult either.

bradbelf262
Calcite | Level 5

Thank you for the response.  The variables are actual dates (mmddyy10. format to be exact).  It is possible that t1 would have more records than t2, also possible (although much more rare) that there would more dates per ID in t2 than t1.  It also is possible that IDs in t1 have no matching ID in t2 or there are dates-id's in t2 with no match in t1.  I hope this helps any future viewers of this and if I am unclear please let me know.

Astounding
PROC Star

Some of the matching rules aren't 100% clear, but I'll go with a reasonable approach:  create one record for each existing t2 record.

 

After both data sets are sorted by ID and TIME1/TIME2:

 

data want;

set t1 (in=in1 rename=(time1=time2))

    t2 (in=in2);

by id time2;

if first.id then time1=.;

retain time1;

if in1 then time1=time2;

if in2 then output;

run;

 

It's untested, but should be OK.  It takes each t2 record and matches it with the most recent t1 record.  The same t1 record could potentially match multiple t2 records.  And there might not be a match if the t2 record is earliest.  In any case, there will be one record in the output for each original t2 record.

PGStats
Opal | Level 21

Assuming you want to minimize the absolute time difference between time1 and time2 :

 

proc sql;
create table t3 as
select t1.ID, time1 format=yymmdd10., time2 format=yymmdd10.
from t1 inner join t2 on t1.ID=t2.ID
group by t1.time1
having abs(t2.time2-t1.time1) = min(abs(t2.time2-t1.time1));
select * from t3;
quit;
PG
Ksharp
Super User

data t1;
input ID  $ time1 : mmddyy10.;
format time1  mmddyy10.;
cards;
A     01/09/2015
A     02/16/2015
;
run;
data t2;
input ID  $ time2 : mmddyy10.;
format time2 mmddyy10.;
cards;
A      01/29/2015
A      02/19/2015
;
run;
proc sql;
select b.*,a.time1
 from t1 as a right join t2 as b
  on a.id=b.id and b.time2 > a.time1
   group by b.id,b.time2
    having b.time2-a.time1=min(b.time2-a.time1);
quit;


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 680 views
  • 0 likes
  • 4 in conversation