- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have two tables
EMP
id name $ date1 yymmdd10.;
101 ram 2016/01/30
101 ram 2016/02/23
101 ram 2016/02/25
102 sam 2016/02/15
and TESTS
id date yymmdd10.;
101 2016/01/29
101 2016/01/30
102 2016/02/12
I would like to have a table like below:
ID DATE1 Date2
101 2016/01/30 2016/01/29
101 2016/02/23 2016/01/30
101 2016/02/25 2016/01/30
102 2016/02/15 2016/02/12
Please help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why is 1/29/2016 in table test closer than 1/30/2016 (also in tests) to 1/30/2016 in EMP?
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Run next code:
data one;
input id name $ date1 yymmdd10. ;
format date1 date9.;
seq = _N_;
datalines;
101 ram 2016/01/30
101 ram 2016/02/23
101 ram 2016/02/25
102 sam 2016/02/15
; run;
data two;
input id date yymmdd10.;
format date date9.;
datalines;
101 2016/01/29
101 2016/01/30
102 2016/02/12
; run;
proc sql;
create table temp as select
a.id, a.seq, a.date1,
b.date as date2,
abs(a.date1 - b.date) as diff
from one as a
left join two as b
on a.id = b.id
group by a.id
order by id, seq, diff
;quit;
proc sql create table want as select id, date1, date2 from temp group by id, seq having min(diff); quit;/* last step can be changed by: */
data want;
set temp;
by id seq;
if first.seq;
run;
this code will produce first output line as:
101 2016/01/30 2016/01/30
instead
101 2016/01/30 2016/01/29
If you insist getting the Jan 29 then change last step to:
data want;
set temp(where=(diff>0));
by id seq;
if first.seq;
run;
/* OR */
data want;
set temp;
by id seq; retain flag; drop flag;
if first.seq then flag=0;
if flag=0 and diff>0 then do;
output; flag=1;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It seems that since you want the test date 2016/01/29 and not 2016/01/30 joined with the emp date 2016/01/30, you want the latest test date that precedes the actual date. The following SQL performs a join on ID where emp date is bigger than tests date and then selects the record with the smallest difference.
proc sql;
create table result (drop=dif) as
select
emp.id, emp.name, emp.date1, tests.date as date2, abs(emp.date1-tests.date) as dif
from emp left join tests
on emp.id = tests.id
where emp.date1 > tests.date
group by emp.id, emp.date1
having dif=min(dif);
quit;
101 2016-01-30 2016-01-29
101 2016-02-23 2016-01-30
101 2016-02-25 2016-01-30
102 2016-02-15 2016-02-12
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ErikLund_Jensen wrote:
It seems that since you want the test date 2016/01/29 and not 2016/01/30 joined with the emp date 2016/01/30, you want the latest test date that precedes the actual date. The following SQL performs a join on ID where emp date is bigger than tests date and then selects the record with the smallest difference.
proc sql;
create table result (drop=dif) as
select
emp.id, emp.name, emp.date1, tests.date as date2, abs(emp.date1-tests.date) as dif
from emp left join tests
on emp.id = tests.id
where emp.date1 > tests.date
group by emp.id, emp.date1
having dif=min(dif);
quit;
101 2016-01-30 2016-01-29
101 2016-02-23 2016-01-30
101 2016-02-25 2016-01-30
102 2016-02-15 2016-02-12
Since you apparently only want test dates preceding emp date, you could even abstain from calclulating DIF by a suitable modification of the JOIN .. ON expression, and corresponding change to the HAVING clause.
proc sql;
create table result as
select emp.id, emp.name, emp.date1, tests.date as date2
from emp left join tests
on emp.id = tests.id and emp.date1> tests.date
group by emp.id, emp.date1
having tests.date=max(tests.date);
quit;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all for your suggestions, appreciate it!