DATA Step, Macro, Functions and more

Join 2 tables based on unique ID and specific date criteria

Accepted Solution Solved
Reply
Contributor
Posts: 41
Accepted Solution

Join 2 tables based on unique ID and specific date criteria

[ Edited ]

Hello,

 

I want to join 2 tables by ID and Date such that table 1 Date is matched to the 3 prior consecutive table 2 Dates.

 

The TIME variable in the below example table is calculated as (TIME=t1.DATE-t2.DATE) and I'm only interested when TIME is 1, 2 and 3. See table below for example data. How can I do this either using proc sql or data steps?

 

 

t1.ID t1.DATE t2.DATE TIME KEEP
1 9/30/2014 9/29/2014 1 YES
1 9/30/2014 9/28/2014 2 YES
1 9/30/2014 9/27/2014 3 YES
1 10/1/2014 9/29/2014 2 NO
1 10/1/2014 9/28/2014 3 NO
2 10/1/2014 9/30/2014 1 YES
2 10/1/2014 9/29/2014 2 YES
2 10/1/2014 9/28/2014 3 YES
3 10/8/2014 10/7/2014 1 NO
3 10/8/2014 10/6/2014 2 NO
3 10/10/2014 10/9/2014 1 YES
3 10/10/2014 10/8/2014 2 YES
3 10/10/2014 10/7/2014 3 YES
4 11/3/2014 11/2/2014 1 NO
4 11/3/2014 11/1/2014 2 NO
4 11/5/2014 11/4/2014 1 YES
4 11/5/2014 11/3/2014 2 YES
4 11/5/2014 11/2/2014 3 YES
4 11/7/2014 11/6/2014 1 YES
4 11/7/2014 11/5/2014 2 YES
4 11/7/2014 11/4/2014 3 YES

Accepted Solutions
Solution
‎12-01-2016 05:43 PM
Trusted Advisor
Posts: 1,022

Re: Join 2 tables based on unique ID and specific date criteria

[ Edited ]

This SAS program uses  a hash table to accumulate table 2 records.  Then when a table 1 record is in hand and the 3 immediately preceding dates are in the hash, then retrieve from the hash and output:

 

Notes:

  1. Both table 1 and table 2 (datasets T1 and T2) are assumed to be sorted by id/date.
  2. The "set t1 t2;  by id date;" statements mean that the T1 and T2 records will be interleaved by id/date.  And whenever a T1 and T2 record have the same id/date, then the T1 record precedes the tied T2 record. 
  3. The hash is cleared when an ID is completely processed.
  4. This will be faster than SQL for any sizable data set.
dm 'clear log';
data t1;
  input id date :mmddyy10.;
  format date yymmddn8.;
datalines;
1 9/30/2014 
1 10/1/2014 
2 10/1/2014 
3 10/8/2014 
3 10/10/2014
4 11/3/2014 
4 11/5/2014 
4 11/7/2014 
run;

data t2;
  input id date :mmddyy10.;
  format date yymmddn8.;
datalines;
1    9/27/2014
1    9/28/2014
1    9/29/2014
2    9/28/2014
2    9/29/2014
2    9/30/2014
3   10/06/2014
3   10/07/2014
3   10/08/2014
3   10/09/2014
4   11/01/2014
4   11/02/2014
4   11/03/2014
4   11/04/2014
4   11/05/2014
4   11/06/2014
run;

data want (drop=rc rename=(date=date2));

  if 0 then set t1 (rename=(date=date1)) t2 ; /* set the PDV*/

  if _n_=1 then do;
    declare hash h (dataset:'t2 (obs=0)');
      h.definekey('date');
      h.definedata(all:'Y');
      h.definedone();
  end;
  
  do until (last.id);
    set t1 (in=in1)  t2 (in=in2) ;
    by id date;

    if in2 then h.add();
    else if in1 then do;
      date1=date; 
      rc=h.check(key:date1-1)
        +h.check(key:date1-2)
        +h.check(key:date1-3);
      if rc=0 then do time=1 to 3;
        h.find(key:date1-time);
        output;
      end;
    end;
  end;
  rc=h.clear();
run;

 

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,568

Re: Join 2 tables based on unique ID and specific date criteria

Try next code:

 

proc sql;

     create table want as select

        t.id , t1.date, t2.date

    from table1 as t1

    join table2 as t2

    on t1.id = t2.id and

        (t1.date - t2.date) in (1,2,3)

   order by id;

quit;

Frequent Contributor
Posts: 127

Re: Join 2 tables based on unique ID and specific date criteria

Hi,

 

For performances reasons, I would adapt your PROC SQL as follow:

 

proc sql;

     create table want as select

        t.id , t1.date, t2.date

    from table1 as t1

    inner join table2 as t2

         on t1.id = t2.id

    where (t1.date - t2.date) in (1, 2, 3)

     order by id;

quit;

Contributor
Posts: 41

Re: Join 2 tables based on unique ID and specific date criteria

Thanks for the where clause clarification. I will need that when working on larger datasets which I often do.

Contributor
Posts: 41

Re: Join 2 tables based on unique ID and specific date criteria

Hi,

 

This gets very close, thank you. The below code restricts to 1st, 2nd and 3rd prior record but I need to combine records only if there are 3 consecutive dates from table 2. See below table for example of records to keep (yes) and not keep (no). I apologize if I wasn't clear on my original post.

 

proc sql;
    create table WANT as
    select distinct t1.ID, t1.DATE, t2.STARTDATE, (t1.DATE-t2.STARTDATE) as TIME, t2.*
    from TABLE1 as t1
        join TABLE2 as t2 on (t1.ID=t2.ID) and (t1.DATE - t2.STARTDATE) in (1,2,3)
        order by ID, Date, STARTDATE desc;
quit;

 

ID DATE STARTDATE TIME KEEP NOTE
2022 9/30/2014 9/29/2014 1 YES This consecutive group contains (1, 2 & 3)
2022 9/30/2014 9/28/2014 2 YES
2022 9/30/2014 9/27/2014 3 YES
2022 10/1/2014 9/29/2014 2 NO This consecutive group does not include (1)
2022 10/1/2014 9/28/2014 3 NO
2091 10/1/2014 9/30/2014 1 NO This non-consecutive group does not include (2)
2091 10/1/2014 9/28/2014 3 NO
2092 10/8/2014 10/7/2014 1 NO This consecutive group does not include (3)
2092 10/8/2014 10/6/2014 2 NO
2092 10/10/2014 10/9/2014 1 YES This consecutive group contains (1, 2 & 3)
2092 10/10/2014 10/8/2014 2 YES
2092 10/10/2014 10/7/2014 3 YES
2094 11/3/2014 11/2/2014 1 NO This consecutive group does not include (3)
2094 11/3/2014 11/1/2014 2 NO
2094 11/5/2014 11/2/2014 3 NO This only contains (3)
2094 11/7/2014 11/6/2014 1 YES This consecutive group contains (1, 2 & 3)
2094 11/7/2014 11/5/2014 2 YES
2094 11/7/2014 11/4/2014 3 YES
2099 12/3/2014 12/2/2014 1 NO This consecutive group does not include (3)
2099 12/3/2014 12/1/2014 2 NO

 

 

 

Trusted Advisor
Posts: 1,568

Re: Join 2 tables based on unique ID and specific date criteria

You can filter non consequtive IDs by counting observations per ID and saving those with count=3.

It will be easyer to be done as second step, like:

 

 proc freq data=want;

          table ID / out=temp1(keep=ID _freq_ where=(_FREQ_ = 3));

 run;

 data want;

   merge want

              temp1(in=in1);

  by ID;

     if in1;

run;

Solution
‎12-01-2016 05:43 PM
Trusted Advisor
Posts: 1,022

Re: Join 2 tables based on unique ID and specific date criteria

[ Edited ]

This SAS program uses  a hash table to accumulate table 2 records.  Then when a table 1 record is in hand and the 3 immediately preceding dates are in the hash, then retrieve from the hash and output:

 

Notes:

  1. Both table 1 and table 2 (datasets T1 and T2) are assumed to be sorted by id/date.
  2. The "set t1 t2;  by id date;" statements mean that the T1 and T2 records will be interleaved by id/date.  And whenever a T1 and T2 record have the same id/date, then the T1 record precedes the tied T2 record. 
  3. The hash is cleared when an ID is completely processed.
  4. This will be faster than SQL for any sizable data set.
dm 'clear log';
data t1;
  input id date :mmddyy10.;
  format date yymmddn8.;
datalines;
1 9/30/2014 
1 10/1/2014 
2 10/1/2014 
3 10/8/2014 
3 10/10/2014
4 11/3/2014 
4 11/5/2014 
4 11/7/2014 
run;

data t2;
  input id date :mmddyy10.;
  format date yymmddn8.;
datalines;
1    9/27/2014
1    9/28/2014
1    9/29/2014
2    9/28/2014
2    9/29/2014
2    9/30/2014
3   10/06/2014
3   10/07/2014
3   10/08/2014
3   10/09/2014
4   11/01/2014
4   11/02/2014
4   11/03/2014
4   11/04/2014
4   11/05/2014
4   11/06/2014
run;

data want (drop=rc rename=(date=date2));

  if 0 then set t1 (rename=(date=date1)) t2 ; /* set the PDV*/

  if _n_=1 then do;
    declare hash h (dataset:'t2 (obs=0)');
      h.definekey('date');
      h.definedata(all:'Y');
      h.definedone();
  end;
  
  do until (last.id);
    set t1 (in=in1)  t2 (in=in2) ;
    by id date;

    if in2 then h.add();
    else if in1 then do;
      date1=date; 
      rc=h.check(key:date1-1)
        +h.check(key:date1-2)
        +h.check(key:date1-3);
      if rc=0 then do time=1 to 3;
        h.find(key:date1-time);
        output;
      end;
    end;
  end;
  rc=h.clear();
run;

 

 

Contributor
Posts: 41

Re: Join 2 tables based on unique ID and specific date criteria

Thank you for all the responses! Again, sorry for my delayed response.

 

After some testing I finally got your hash code to work with one small modification. The test code worked perfectly, but my actual T2 dataset had another 40+ variables so I added the below to your code. Hash tables are new to me so I'm not totally clear why I needed to add the multidata part. Thank you.

 

declare hash h (dataset:'t22 (obs=0)', multidata: "Y");

Frequent Contributor
Posts: 127

Re: Join 2 tables based on unique ID and specific date criteria

[ Edited ]

For what it's still worth, here below my small contribution (using simple PROC SQL/Datasteps):

proc sql;
     create table have as
         select  distinct t.id
         ,          t1.date
         ,          t2.date as StartDate
    from table1 as t1
    inner join table2 as t2
         on t1.id = t2.id
    where (t1.date - t2.date) in (1, 2, 3)
    order by id, date, StartDate desc;
quit;

data WANT_Selection (drop= Previous_: count StartDate Time);
	set have;
	by ID Date descending StartDate;

	retain count;

	Previous_Date = LAG1(Date);
	Previous_Time = LAG1(Time);

	if first.Date then do;
		count = 0;
		if Time = 1 then count + 1;
	end; else
	if not first.Date then do;
		if Previous_Date = Date and count < 3 then do;
			if (Previous_Time = 1 and Time = 2) or (Previous_Time = 2 and Time = 3) then count + 1;
		end;
	end;

	if last.Date and count >= 3 then output;
run;

data want;
	merge have 		   (in=inHave)
		   want_selection (in=inSelection);
	by ID Date;

	if inSelection;
run;
Super User
Posts: 10,028

Re: Join 2 tables based on unique ID and specific date criteria

Once you got that table. That would be easy.


data have;
infile cards expandtabs truncover;
input ID	DATE1 : mmddyy10.	DATE2 : mmddyy10.	TIME;
format date1 date2 mmddyy10.;
n+1;
cards;
1	9/30/2014	9/29/2014	1	YES
1	9/30/2014	9/28/2014	2	YES
1	9/30/2014	9/27/2014	3	YES
1	10/1/2014	9/29/2014	2	NO
1	10/1/2014	9/28/2014	3	NO
2	10/1/2014	9/30/2014	1	YES
2	10/1/2014	9/29/2014	2	YES
2	10/1/2014	9/28/2014	3	YES
3	10/8/2014	10/7/2014	1	NO
3	10/8/2014	10/6/2014	2	NO
3	10/10/2014	10/9/2014	1	YES
3	10/10/2014	10/8/2014	2	YES
3	10/10/2014	10/7/2014	3	YES
4	11/3/2014	11/2/2014	1	NO
4	11/3/2014	11/1/2014	2	NO
4	11/5/2014	11/4/2014	1	YES
4	11/5/2014	11/3/2014	2	YES
4	11/5/2014	11/2/2014	3	YES
4	11/7/2014	11/6/2014	1	YES
4	11/7/2014	11/5/2014	2	YES
4	11/7/2014	11/4/2014	3
;
run;
data key;
 set have;
 if time=3 and
    lag(time)=2 and id=lag(id) and
    lag2(time)=1 and id=lag2(id) then do;
 output;
 n=n-1;output;
 n=n-1;output;
 end;
keep n;
run;
data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('n');
  h.definedone();
 end;
set have;
if h.check()=0;
drop n;
run;


☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 372 views
  • 7 likes
  • 5 in conversation