Re: How to find closest date using proc sql?

renjithr · Posted 02-11-2017 04:51 AM

Hi All,

I have two tables

EMP

id name $ date1 yymmdd10.;

101 ram 2016/01/30
101 ram 2016/02/23

101 ram 2016/02/25
102 sam 2016/02/15

and TESTS

id date yymmdd10.;
101 2016/01/29
101 2016/01/30
102 2016/02/12

I would like to have a table like below:

ID DATE1 Date2

101 2016/01/30 2016/01/29

101 2016/02/23 2016/01/30

101 2016/02/25 2016/01/30

102 2016/02/15 2016/02/12

Please help!

mkeintz · Posted 02-11-2017 08:50 AM

Why is 1/29/2016 in table test closer than 1/30/2016 (also in tests) to 1/30/2016 in EMP?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Shmuel · Posted 02-11-2017 11:53 AM

Run next code:

data one;
  input id name $ date1 yymmdd10. ;
  format date1 date9.;
  seq = _N_;
datalines;
101 ram 2016/01/30
101 ram 2016/02/23
101 ram 2016/02/25
102 sam 2016/02/15
; run;

data two;
  input id date yymmdd10.;
  format date date9.;
datalines;
101 2016/01/29
101 2016/01/30
102 2016/02/12
; run;

proc sql;
    create table temp as select
      a.id, a.seq, a.date1,
      b.date as date2,
      abs(a.date1 - b.date) as diff
    from one as a   
    left join two as b    
    on a.id = b.id 
    group by a.id
    order by id, seq, diff
;quit;

proc sql
   create table want 
   as select id, date1, date2
   from temp
   group by id, seq
   having min(diff)
; quit;
/* last step can be changed by: */
data want;
 set temp;
  by id seq;
     if first.seq;
run;

this code will produce first output line as:

101 2016/01/30 2016/01/30

instead

101 2016/01/30 2016/01/29

If you insist getting the Jan 29 then change last step to:

data want;
 set temp(where=(diff>0));
  by id seq;
     if first.seq;
run;


/* OR */

data want;
 set temp;
  by id seq;
     retain flag; drop flag;
     if first.seq then flag=0;
     if flag=0 and diff>0 then do;
        output; flag=1;
     end;
run;

ErikLund_Jensen · Posted 02-11-2017 12:06 PM

It seems that since you want the test date 2016/01/29 and not 2016/01/30 joined with the emp date 2016/01/30, you want the latest test date that precedes the actual date. The following SQL performs a join on ID where emp date is bigger than tests date and then selects the record with the smallest difference.

proc sql;
create table result (drop=dif) as
select
emp.id, emp.name, emp.date1, tests.date as date2, abs(emp.date1-tests.date) as dif
from emp left join tests
on emp.id = tests.id
where emp.date1 > tests.date
group by emp.id, emp.date1
having dif=min(dif);
quit;

101 2016-01-30 2016-01-29
101 2016-02-23 2016-01-30
101 2016-02-25 2016-01-30
102 2016-02-15 2016-02-12

mkeintz · Posted 02-11-2017 02:37 PM

@ErikLund_Jensen wrote:

It seems that since you want the test date 2016/01/29 and not 2016/01/30 joined with the emp date 2016/01/30, you want the latest test date that precedes the actual date. The following SQL performs a join on ID where emp date is bigger than tests date and then selects the record with the smallest difference.

proc sql;
create table result (drop=dif) as
select
emp.id, emp.name, emp.date1, tests.date as date2, abs(emp.date1-tests.date) as dif
from emp left join tests
on emp.id = tests.id
where emp.date1 > tests.date
group by emp.id, emp.date1
having dif=min(dif);
quit;

101 2016-01-30 2016-01-29
101 2016-02-23 2016-01-30
101 2016-02-25 2016-01-30
102 2016-02-15 2016-02-12

Since you apparently only want test dates preceding emp date, you could even abstain from calclulating DIF by a suitable modification of the JOIN .. ON expression, and corresponding change to the HAVING clause.

proc sql;

create table result as

select emp.id, emp.name, emp.date1, tests.date as date2

from emp left join tests

on emp.id = tests.id and emp.date1> tests.date

group by emp.id, emp.date1

having tests.date=max(tests.date);

quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

renjithr · Posted 02-15-2017 10:41 PM

Thank you all for your suggestions, appreciate it!

How to find closest date using proc sql?