Solved
Contributor
Posts: 41

# Join 2 tables based on unique ID and specific date criteria

[ Edited ]

Hello,

I want to join 2 tables by ID and Date such that table 1 Date is matched to the 3 prior consecutive table 2 Dates.

The TIME variable in the below example table is calculated as (TIME=t1.DATE-t2.DATE) and I'm only interested when TIME is 1, 2 and 3. See table below for example data. How can I do this either using proc sql or data steps?

 t1.ID t1.DATE t2.DATE TIME KEEP 1 9/30/2014 9/29/2014 1 YES 1 9/30/2014 9/28/2014 2 YES 1 9/30/2014 9/27/2014 3 YES 1 10/1/2014 9/29/2014 2 NO 1 10/1/2014 9/28/2014 3 NO 2 10/1/2014 9/30/2014 1 YES 2 10/1/2014 9/29/2014 2 YES 2 10/1/2014 9/28/2014 3 YES 3 10/8/2014 10/7/2014 1 NO 3 10/8/2014 10/6/2014 2 NO 3 10/10/2014 10/9/2014 1 YES 3 10/10/2014 10/8/2014 2 YES 3 10/10/2014 10/7/2014 3 YES 4 11/3/2014 11/2/2014 1 NO 4 11/3/2014 11/1/2014 2 NO 4 11/5/2014 11/4/2014 1 YES 4 11/5/2014 11/3/2014 2 YES 4 11/5/2014 11/2/2014 3 YES 4 11/7/2014 11/6/2014 1 YES 4 11/7/2014 11/5/2014 2 YES 4 11/7/2014 11/4/2014 3 YES

Accepted Solutions
Solution
‎12-01-2016 05:43 PM
Posts: 1,288

## Re: Join 2 tables based on unique ID and specific date criteria

[ Edited ]

This SAS program uses  a hash table to accumulate table 2 records.  Then when a table 1 record is in hand and the 3 immediately preceding dates are in the hash, then retrieve from the hash and output:

Notes:

1. Both table 1 and table 2 (datasets T1 and T2) are assumed to be sorted by id/date.
2. The "set t1 t2;  by id date;" statements mean that the T1 and T2 records will be interleaved by id/date.  And whenever a T1 and T2 record have the same id/date, then the T1 record precedes the tied T2 record.
3. The hash is cleared when an ID is completely processed.
4. This will be faster than SQL for any sizable data set.
dm 'clear log';
data t1;
input id date :mmddyy10.;
format date yymmddn8.;
datalines;
1 9/30/2014
1 10/1/2014
2 10/1/2014
3 10/8/2014
3 10/10/2014
4 11/3/2014
4 11/5/2014
4 11/7/2014
run;

data t2;
input id date :mmddyy10.;
format date yymmddn8.;
datalines;
1    9/27/2014
1    9/28/2014
1    9/29/2014
2    9/28/2014
2    9/29/2014
2    9/30/2014
3   10/06/2014
3   10/07/2014
3   10/08/2014
3   10/09/2014
4   11/01/2014
4   11/02/2014
4   11/03/2014
4   11/04/2014
4   11/05/2014
4   11/06/2014
run;

data want (drop=rc rename=(date=date2));

if 0 then set t1 (rename=(date=date1)) t2 ; /* set the PDV*/

if _n_=1 then do;
declare hash h (dataset:'t2 (obs=0)');
h.definekey('date');
h.definedata(all:'Y');
h.definedone();
end;

do until (last.id);
set t1 (in=in1)  t2 (in=in2) ;
by id date;

else if in1 then do;
date1=date;
rc=h.check(key:date1-1)
+h.check(key:date1-2)
+h.check(key:date1-3);
if rc=0 then do time=1 to 3;
h.find(key:date1-time);
output;
end;
end;
end;
rc=h.clear();
run;

All Replies
Posts: 1,826

## Re: Join 2 tables based on unique ID and specific date criteria

Try next code:

proc sql;

create table want as select

t.id , t1.date, t2.date

from table1 as t1

join table2 as t2

on t1.id = t2.id and

(t1.date - t2.date) in (1,2,3)

order by id;

quit;

Frequent Contributor
Posts: 131

## Re: Join 2 tables based on unique ID and specific date criteria

Hi,

proc sql;

create table want as select

t.id , t1.date, t2.date

from table1 as t1

inner join table2 as t2

on t1.id = t2.id

where (t1.date - t2.date) in (1, 2, 3)

order by id;

quit;

Contributor
Posts: 41

## Re: Join 2 tables based on unique ID and specific date criteria

Thanks for the where clause clarification. I will need that when working on larger datasets which I often do.

Contributor
Posts: 41

## Re: Join 2 tables based on unique ID and specific date criteria

Hi,

This gets very close, thank you. The below code restricts to 1st, 2nd and 3rd prior record but I need to combine records only if there are 3 consecutive dates from table 2. See below table for example of records to keep (yes) and not keep (no). I apologize if I wasn't clear on my original post.

proc sql;
create table WANT as
select distinct t1.ID, t1.DATE, t2.STARTDATE, (t1.DATE-t2.STARTDATE) as TIME, t2.*
from TABLE1 as t1
join TABLE2 as t2 on (t1.ID=t2.ID) and (t1.DATE - t2.STARTDATE) in (1,2,3)
order by ID, Date, STARTDATE desc;
quit;

 ID DATE STARTDATE TIME KEEP NOTE 2022 9/30/2014 9/29/2014 1 YES This consecutive group contains (1, 2 & 3) 2022 9/30/2014 9/28/2014 2 YES 2022 9/30/2014 9/27/2014 3 YES 2022 10/1/2014 9/29/2014 2 NO This consecutive group does not include (1) 2022 10/1/2014 9/28/2014 3 NO 2091 10/1/2014 9/30/2014 1 NO This non-consecutive group does not include (2) 2091 10/1/2014 9/28/2014 3 NO 2092 10/8/2014 10/7/2014 1 NO This consecutive group does not include (3) 2092 10/8/2014 10/6/2014 2 NO 2092 10/10/2014 10/9/2014 1 YES This consecutive group contains (1, 2 & 3) 2092 10/10/2014 10/8/2014 2 YES 2092 10/10/2014 10/7/2014 3 YES 2094 11/3/2014 11/2/2014 1 NO This consecutive group does not include (3) 2094 11/3/2014 11/1/2014 2 NO 2094 11/5/2014 11/2/2014 3 NO This only contains (3) 2094 11/7/2014 11/6/2014 1 YES This consecutive group contains (1, 2 & 3) 2094 11/7/2014 11/5/2014 2 YES 2094 11/7/2014 11/4/2014 3 YES 2099 12/3/2014 12/2/2014 1 NO This consecutive group does not include (3) 2099 12/3/2014 12/1/2014 2 NO

Posts: 1,826

## Re: Join 2 tables based on unique ID and specific date criteria

You can filter non consequtive IDs by counting observations per ID and saving those with count=3.

It will be easyer to be done as second step, like:

proc freq data=want;

table ID / out=temp1(keep=ID _freq_ where=(_FREQ_ = 3));

run;

data want;

merge want

temp1(in=in1);

by ID;

if in1;

run;

Solution
‎12-01-2016 05:43 PM
Posts: 1,288

## Re: Join 2 tables based on unique ID and specific date criteria

[ Edited ]

This SAS program uses  a hash table to accumulate table 2 records.  Then when a table 1 record is in hand and the 3 immediately preceding dates are in the hash, then retrieve from the hash and output:

Notes:

1. Both table 1 and table 2 (datasets T1 and T2) are assumed to be sorted by id/date.
2. The "set t1 t2;  by id date;" statements mean that the T1 and T2 records will be interleaved by id/date.  And whenever a T1 and T2 record have the same id/date, then the T1 record precedes the tied T2 record.
3. The hash is cleared when an ID is completely processed.
4. This will be faster than SQL for any sizable data set.
dm 'clear log';
data t1;
input id date :mmddyy10.;
format date yymmddn8.;
datalines;
1 9/30/2014
1 10/1/2014
2 10/1/2014
3 10/8/2014
3 10/10/2014
4 11/3/2014
4 11/5/2014
4 11/7/2014
run;

data t2;
input id date :mmddyy10.;
format date yymmddn8.;
datalines;
1    9/27/2014
1    9/28/2014
1    9/29/2014
2    9/28/2014
2    9/29/2014
2    9/30/2014
3   10/06/2014
3   10/07/2014
3   10/08/2014
3   10/09/2014
4   11/01/2014
4   11/02/2014
4   11/03/2014
4   11/04/2014
4   11/05/2014
4   11/06/2014
run;

data want (drop=rc rename=(date=date2));

if 0 then set t1 (rename=(date=date1)) t2 ; /* set the PDV*/

if _n_=1 then do;
declare hash h (dataset:'t2 (obs=0)');
h.definekey('date');
h.definedata(all:'Y');
h.definedone();
end;

do until (last.id);
set t1 (in=in1)  t2 (in=in2) ;
by id date;

else if in1 then do;
date1=date;
rc=h.check(key:date1-1)
+h.check(key:date1-2)
+h.check(key:date1-3);
if rc=0 then do time=1 to 3;
h.find(key:date1-time);
output;
end;
end;
end;
rc=h.clear();
run;

Contributor
Posts: 41

## Re: Join 2 tables based on unique ID and specific date criteria

Thank you for all the responses! Again, sorry for my delayed response.

After some testing I finally got your hash code to work with one small modification. The test code worked perfectly, but my actual T2 dataset had another 40+ variables so I added the below to your code. Hash tables are new to me so I'm not totally clear why I needed to add the multidata part. Thank you.

declare hash h (dataset:'t22 (obs=0)', multidata: "Y");

Frequent Contributor
Posts: 131

## Re: Join 2 tables based on unique ID and specific date criteria

[ Edited ]

For what it's still worth, here below my small contribution (using simple PROC SQL/Datasteps):

proc sql;
create table have as
select  distinct t.id
,          t1.date
,          t2.date as StartDate
from table1 as t1
inner join table2 as t2
on t1.id = t2.id
where (t1.date - t2.date) in (1, 2, 3)
order by id, date, StartDate desc;
quit;

data WANT_Selection (drop= Previous_: count StartDate Time);
set have;
by ID Date descending StartDate;

retain count;

Previous_Date = LAG1(Date);
Previous_Time = LAG1(Time);

if first.Date then do;
count = 0;
if Time = 1 then count + 1;
end; else
if not first.Date then do;
if Previous_Date = Date and count < 3 then do;
if (Previous_Time = 1 and Time = 2) or (Previous_Time = 2 and Time = 3) then count + 1;
end;
end;

if last.Date and count >= 3 then output;
run;

data want;
merge have 		   (in=inHave)
want_selection (in=inSelection);
by ID Date;

if inSelection;
run;
Super User
Posts: 10,621

## Re: Join 2 tables based on unique ID and specific date criteria

Once you got that table. That would be easy.

data have;
infile cards expandtabs truncover;
input ID	DATE1 : mmddyy10.	DATE2 : mmddyy10.	TIME;
format date1 date2 mmddyy10.;
n+1;
cards;
1	9/30/2014	9/29/2014	1	YES
1	9/30/2014	9/28/2014	2	YES
1	9/30/2014	9/27/2014	3	YES
1	10/1/2014	9/29/2014	2	NO
1	10/1/2014	9/28/2014	3	NO
2	10/1/2014	9/30/2014	1	YES
2	10/1/2014	9/29/2014	2	YES
2	10/1/2014	9/28/2014	3	YES
3	10/8/2014	10/7/2014	1	NO
3	10/8/2014	10/6/2014	2	NO
3	10/10/2014	10/9/2014	1	YES
3	10/10/2014	10/8/2014	2	YES
3	10/10/2014	10/7/2014	3	YES
4	11/3/2014	11/2/2014	1	NO
4	11/3/2014	11/1/2014	2	NO
4	11/5/2014	11/4/2014	1	YES
4	11/5/2014	11/3/2014	2	YES
4	11/5/2014	11/2/2014	3	YES
4	11/7/2014	11/6/2014	1	YES
4	11/7/2014	11/5/2014	2	YES
4	11/7/2014	11/4/2014	3
;
run;
data key;
set have;
if time=3 and
lag(time)=2 and id=lag(id) and
lag2(time)=1 and id=lag2(id) then do;
output;
n=n-1;output;
n=n-1;output;
end;
keep n;
run;
data want;
if _n_=1 then do;
if 0 then set key;
declare hash h(dataset:'key');
h.definekey('n');
h.definedone();
end;
set have;
if h.check()=0;
drop n;
run;

☑ This topic is solved.