Hi all,
I am trying to merge dates by the closest day. I just need help tweaking the code a little bit. I tried to look around but could not find a way to accomplish what i need. Any help is appreciated!
Here are the two datasets:
data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
1 7/16/2016
1 10/2/2017
2 7/17/2016
3 7/18/2016
4 12/15/2018
4 1/19/2016
5 2/21/2017
5 5/19/2018
5 8/24/2018
6 10/2/2016
6 3/10/2018
7 5/9/2018
;
data bb;
input id received MMDDYY10.;
format received MMDDYY10.;
cards;
1 07/26/2016
1 10/22/2017
2 8/1/2016
3 8/3/2016
3 8/3/2016
4 1/6/2019
5 9/8/2018
6 3/25/2018
7 5/10/2018
7 10/10/2019
;
Data want
ID | Order | Received |
1 | 7/16/2016 | 7/26/2016 |
1 | 10/2/2017 | 10/22/2017 |
2 | 7/17/2016 | 8/1/2016 |
3 | 7/18/2016 | 8/3/2016 |
4 | 12/15/2018 | 1/6/2019 |
4 | 1/19/2016 | |
5 | 2/21/2017 | |
5 | 5/19/2018 | |
5 | 8/24/2018 | 9/8/2018 |
6 | 10/2/2016 | |
6 | 3/10/2018 | 3/25/2018 |
7 | 5/9/2018 | 5/10/2018 |
7 | 10/10/2019 |
Here is my code:
proc sql;
create table want as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
group by a.id,order;
quit;
I am missing the "having" keyword after "group by". Or would this be easier in data step?
proc sql;
create table want as
select distinct a.*,b.received
from aa a full join bb b
on a.id=b.id and order<=received
and ((received-order) <= <threshold_value_in_days> or missing((received-order)))
group by a.id,order;
quit;
proc sql;
create table want as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
and (received-order) <= <threshold_value_in_days>
group by a.id,order;
quit;
proc sql;
create table want as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
and ( (received-order) <= <threshold_value_in_days> or missing(order))
group by a.id,order;
quit;
proc sql;
create table want as
select distinct a.*,b.received
from aa a full join bb b
on a.id=b.id and order<=received
and ((received-order) <= <threshold_value_in_days> or missing((received-order)))
group by a.id,order;
quit;
A slight change to your code, so that id is set correctly for the "received" that do not have a matching order:
proc sql;
create table want1 as
select distinct
coalesce(a.id,b.id) as id,
a.order,
b.received
from aa a full join bb b
on a.id = b.id and order <= received
and ((received - order) <= 90 or missing((received - order)))
group by calculated id, order
;
quit;
@rajd1 wrote:
This is great! Just as a thought if there was a way not to use the threshold value in days...
You can use the INTCK function to computee the distance in time.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.