BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
WAL83
Obsidian | Level 7

Hello,

 

I want to join 2 tables by ID and Date such that table 1 Date is matched to the 3 prior consecutive table 2 Dates.

 

The TIME variable in the below example table is calculated as (TIME=t1.DATE-t2.DATE) and I'm only interested when TIME is 1, 2 and 3. See table below for example data. How can I do this either using proc sql or data steps?

 

 

t1.ID t1.DATE t2.DATE TIME KEEP
1 9/30/2014 9/29/2014 1 YES
1 9/30/2014 9/28/2014 2 YES
1 9/30/2014 9/27/2014 3 YES
1 10/1/2014 9/29/2014 2 NO
1 10/1/2014 9/28/2014 3 NO
2 10/1/2014 9/30/2014 1 YES
2 10/1/2014 9/29/2014 2 YES
2 10/1/2014 9/28/2014 3 YES
3 10/8/2014 10/7/2014 1 NO
3 10/8/2014 10/6/2014 2 NO
3 10/10/2014 10/9/2014 1 YES
3 10/10/2014 10/8/2014 2 YES
3 10/10/2014 10/7/2014 3 YES
4 11/3/2014 11/2/2014 1 NO
4 11/3/2014 11/1/2014 2 NO
4 11/5/2014 11/4/2014 1 YES
4 11/5/2014 11/3/2014 2 YES
4 11/5/2014 11/2/2014 3 YES
4 11/7/2014 11/6/2014 1 YES
4 11/7/2014 11/5/2014 2 YES
4 11/7/2014 11/4/2014 3 YES
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This SAS program uses  a hash table to accumulate table 2 records.  Then when a table 1 record is in hand and the 3 immediately preceding dates are in the hash, then retrieve from the hash and output:

 

Notes:

  1. Both table 1 and table 2 (datasets T1 and T2) are assumed to be sorted by id/date.
  2. The "set t1 t2;  by id date;" statements mean that the T1 and T2 records will be interleaved by id/date.  And whenever a T1 and T2 record have the same id/date, then the T1 record precedes the tied T2 record. 
  3. The hash is cleared when an ID is completely processed.
  4. This will be faster than SQL for any sizable data set.
dm 'clear log';
data t1;
  input id date :mmddyy10.;
  format date yymmddn8.;
datalines;
1 9/30/2014 
1 10/1/2014 
2 10/1/2014 
3 10/8/2014 
3 10/10/2014
4 11/3/2014 
4 11/5/2014 
4 11/7/2014 
run;

data t2;
  input id date :mmddyy10.;
  format date yymmddn8.;
datalines;
1    9/27/2014
1    9/28/2014
1    9/29/2014
2    9/28/2014
2    9/29/2014
2    9/30/2014
3   10/06/2014
3   10/07/2014
3   10/08/2014
3   10/09/2014
4   11/01/2014
4   11/02/2014
4   11/03/2014
4   11/04/2014
4   11/05/2014
4   11/06/2014
run;

data want (drop=rc rename=(date=date2));

  if 0 then set t1 (rename=(date=date1)) t2 ; /* set the PDV*/

  if _n_=1 then do;
    declare hash h (dataset:'t2 (obs=0)');
      h.definekey('date');
      h.definedata(all:'Y');
      h.definedone();
  end;
  
  do until (last.id);
    set t1 (in=in1)  t2 (in=in2) ;
    by id date;

    if in2 then h.add();
    else if in1 then do;
      date1=date; 
      rc=h.check(key:date1-1)
        +h.check(key:date1-2)
        +h.check(key:date1-3);
      if rc=0 then do time=1 to 3;
        h.find(key:date1-time);
        output;
      end;
    end;
  end;
  rc=h.clear();
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
Shmuel
Garnet | Level 18

Try next code:

 

proc sql;

     create table want as select

        t.id , t1.date, t2.date

    from table1 as t1

    join table2 as t2

    on t1.id = t2.id and

        (t1.date - t2.date) in (1,2,3)

   order by id;

quit;

Florent
Quartz | Level 8

Hi,

 

For performances reasons, I would adapt your PROC SQL as follow:

 

proc sql;

     create table want as select

        t.id , t1.date, t2.date

    from table1 as t1

    inner join table2 as t2

         on t1.id = t2.id

    where (t1.date - t2.date) in (1, 2, 3)

     order by id;

quit;

WAL83
Obsidian | Level 7

Thanks for the where clause clarification. I will need that when working on larger datasets which I often do.

WAL83
Obsidian | Level 7

Hi,

 

This gets very close, thank you. The below code restricts to 1st, 2nd and 3rd prior record but I need to combine records only if there are 3 consecutive dates from table 2. See below table for example of records to keep (yes) and not keep (no). I apologize if I wasn't clear on my original post.

 

proc sql;
    create table WANT as
    select distinct t1.ID, t1.DATE, t2.STARTDATE, (t1.DATE-t2.STARTDATE) as TIME, t2.*
    from TABLE1 as t1
        join TABLE2 as t2 on (t1.ID=t2.ID) and (t1.DATE - t2.STARTDATE) in (1,2,3)
        order by ID, Date, STARTDATE desc;
quit;

 

ID DATE STARTDATE TIME KEEP NOTE
2022 9/30/2014 9/29/2014 1 YES This consecutive group contains (1, 2 & 3)
2022 9/30/2014 9/28/2014 2 YES
2022 9/30/2014 9/27/2014 3 YES
2022 10/1/2014 9/29/2014 2 NO This consecutive group does not include (1)
2022 10/1/2014 9/28/2014 3 NO
2091 10/1/2014 9/30/2014 1 NO This non-consecutive group does not include (2)
2091 10/1/2014 9/28/2014 3 NO
2092 10/8/2014 10/7/2014 1 NO This consecutive group does not include (3)
2092 10/8/2014 10/6/2014 2 NO
2092 10/10/2014 10/9/2014 1 YES This consecutive group contains (1, 2 & 3)
2092 10/10/2014 10/8/2014 2 YES
2092 10/10/2014 10/7/2014 3 YES
2094 11/3/2014 11/2/2014 1 NO This consecutive group does not include (3)
2094 11/3/2014 11/1/2014 2 NO
2094 11/5/2014 11/2/2014 3 NO This only contains (3)
2094 11/7/2014 11/6/2014 1 YES This consecutive group contains (1, 2 & 3)
2094 11/7/2014 11/5/2014 2 YES
2094 11/7/2014 11/4/2014 3 YES
2099 12/3/2014 12/2/2014 1 NO This consecutive group does not include (3)
2099 12/3/2014 12/1/2014 2 NO

 

 

 

Shmuel
Garnet | Level 18

You can filter non consequtive IDs by counting observations per ID and saving those with count=3.

It will be easyer to be done as second step, like:

 

 proc freq data=want;

          table ID / out=temp1(keep=ID _freq_ where=(_FREQ_ = 3));

 run;

 data want;

   merge want

              temp1(in=in1);

  by ID;

     if in1;

run;

mkeintz
PROC Star

This SAS program uses  a hash table to accumulate table 2 records.  Then when a table 1 record is in hand and the 3 immediately preceding dates are in the hash, then retrieve from the hash and output:

 

Notes:

  1. Both table 1 and table 2 (datasets T1 and T2) are assumed to be sorted by id/date.
  2. The "set t1 t2;  by id date;" statements mean that the T1 and T2 records will be interleaved by id/date.  And whenever a T1 and T2 record have the same id/date, then the T1 record precedes the tied T2 record. 
  3. The hash is cleared when an ID is completely processed.
  4. This will be faster than SQL for any sizable data set.
dm 'clear log';
data t1;
  input id date :mmddyy10.;
  format date yymmddn8.;
datalines;
1 9/30/2014 
1 10/1/2014 
2 10/1/2014 
3 10/8/2014 
3 10/10/2014
4 11/3/2014 
4 11/5/2014 
4 11/7/2014 
run;

data t2;
  input id date :mmddyy10.;
  format date yymmddn8.;
datalines;
1    9/27/2014
1    9/28/2014
1    9/29/2014
2    9/28/2014
2    9/29/2014
2    9/30/2014
3   10/06/2014
3   10/07/2014
3   10/08/2014
3   10/09/2014
4   11/01/2014
4   11/02/2014
4   11/03/2014
4   11/04/2014
4   11/05/2014
4   11/06/2014
run;

data want (drop=rc rename=(date=date2));

  if 0 then set t1 (rename=(date=date1)) t2 ; /* set the PDV*/

  if _n_=1 then do;
    declare hash h (dataset:'t2 (obs=0)');
      h.definekey('date');
      h.definedata(all:'Y');
      h.definedone();
  end;
  
  do until (last.id);
    set t1 (in=in1)  t2 (in=in2) ;
    by id date;

    if in2 then h.add();
    else if in1 then do;
      date1=date; 
      rc=h.check(key:date1-1)
        +h.check(key:date1-2)
        +h.check(key:date1-3);
      if rc=0 then do time=1 to 3;
        h.find(key:date1-time);
        output;
      end;
    end;
  end;
  rc=h.clear();
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
WAL83
Obsidian | Level 7

Thank you for all the responses! Again, sorry for my delayed response.

 

After some testing I finally got your hash code to work with one small modification. The test code worked perfectly, but my actual T2 dataset had another 40+ variables so I added the below to your code. Hash tables are new to me so I'm not totally clear why I needed to add the multidata part. Thank you.

 

declare hash h (dataset:'t22 (obs=0)', multidata: "Y");

Florent
Quartz | Level 8

For what it's still worth, here below my small contribution (using simple PROC SQL/Datasteps):

proc sql;
     create table have as
         select  distinct t.id
         ,          t1.date
         ,          t2.date as StartDate
    from table1 as t1
    inner join table2 as t2
         on t1.id = t2.id
    where (t1.date - t2.date) in (1, 2, 3)
    order by id, date, StartDate desc;
quit;

data WANT_Selection (drop= Previous_: count StartDate Time);
	set have;
	by ID Date descending StartDate;

	retain count;

	Previous_Date = LAG1(Date);
	Previous_Time = LAG1(Time);

	if first.Date then do;
		count = 0;
		if Time = 1 then count + 1;
	end; else
	if not first.Date then do;
		if Previous_Date = Date and count < 3 then do;
			if (Previous_Time = 1 and Time = 2) or (Previous_Time = 2 and Time = 3) then count + 1;
		end;
	end;

	if last.Date and count >= 3 then output;
run;

data want;
	merge have 		   (in=inHave)
		   want_selection (in=inSelection);
	by ID Date;

	if inSelection;
run;
Ksharp
Super User
Once you got that table. That would be easy.


data have;
infile cards expandtabs truncover;
input ID	DATE1 : mmddyy10.	DATE2 : mmddyy10.	TIME;
format date1 date2 mmddyy10.;
n+1;
cards;
1	9/30/2014	9/29/2014	1	YES
1	9/30/2014	9/28/2014	2	YES
1	9/30/2014	9/27/2014	3	YES
1	10/1/2014	9/29/2014	2	NO
1	10/1/2014	9/28/2014	3	NO
2	10/1/2014	9/30/2014	1	YES
2	10/1/2014	9/29/2014	2	YES
2	10/1/2014	9/28/2014	3	YES
3	10/8/2014	10/7/2014	1	NO
3	10/8/2014	10/6/2014	2	NO
3	10/10/2014	10/9/2014	1	YES
3	10/10/2014	10/8/2014	2	YES
3	10/10/2014	10/7/2014	3	YES
4	11/3/2014	11/2/2014	1	NO
4	11/3/2014	11/1/2014	2	NO
4	11/5/2014	11/4/2014	1	YES
4	11/5/2014	11/3/2014	2	YES
4	11/5/2014	11/2/2014	3	YES
4	11/7/2014	11/6/2014	1	YES
4	11/7/2014	11/5/2014	2	YES
4	11/7/2014	11/4/2014	3
;
run;
data key;
 set have;
 if time=3 and
    lag(time)=2 and id=lag(id) and
    lag2(time)=1 and id=lag2(id) then do;
 output;
 n=n-1;output;
 n=n-1;output;
 end;
keep n;
run;
data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('n');
  h.definedone();
 end;
set have;
if h.check()=0;
drop n;
run;


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 3384 views
  • 7 likes
  • 5 in conversation