Hi all,
I am trying to merge dates by the closest day. I just need help tweaking the code a little bit. I tried to look around but could not find a way to accomplish what i need. Any help is appreciated!
Here are the two datasets:
data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
1 7/16/2016
1 10/2/2017
2 7/17/2016
3 7/18/2016
4 12/15/2018
4 1/19/2016
5 2/21/2017
5 5/19/2018
5 8/24/2018
6 10/2/2016
6 3/10/2018
7 5/9/2018
;
data bb;
input id received MMDDYY10.;
format received MMDDYY10.;
cards;
1 07/26/2016
1 10/22/2017
2 8/1/2016
3 8/3/2016
3 8/3/2016
4 1/6/2019
5 9/8/2018
6 3/25/2018
7 5/10/2018
7 10/10/2019
;
Data want
ID | Order | Received |
1 | 7/16/2016 | 7/26/2016 |
1 | 10/2/2017 | 10/22/2017 |
2 | 7/17/2016 | 8/1/2016 |
3 | 7/18/2016 | 8/3/2016 |
4 | 12/15/2018 | 1/6/2019 |
4 | 1/19/2016 | |
5 | 2/21/2017 | |
5 | 5/19/2018 | |
5 | 8/24/2018 | 9/8/2018 |
6 | 10/2/2016 | |
6 | 3/10/2018 | 3/25/2018 |
7 | 5/9/2018 | 5/10/2018 |
7 | 10/10/2019 |
Here is my code:
proc sql;
create table want as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
group by a.id,order;
quit;
I am missing the "having" keyword after "group by". Or would this be easier in data step?
proc sql;
create table want as
select distinct a.*,b.received
from aa a full join bb b
on a.id=b.id and order<=received
and ((received-order) <= <threshold_value_in_days> or missing((received-order)))
group by a.id,order;
quit;
proc sql;
create table want as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
and (received-order) <= <threshold_value_in_days>
group by a.id,order;
quit;
proc sql;
create table want as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
and ( (received-order) <= <threshold_value_in_days> or missing(order))
group by a.id,order;
quit;
proc sql;
create table want as
select distinct a.*,b.received
from aa a full join bb b
on a.id=b.id and order<=received
and ((received-order) <= <threshold_value_in_days> or missing((received-order)))
group by a.id,order;
quit;
A slight change to your code, so that id is set correctly for the "received" that do not have a matching order:
proc sql;
create table want1 as
select distinct
coalesce(a.id,b.id) as id,
a.order,
b.received
from aa a full join bb b
on a.id = b.id and order <= received
and ((received - order) <= 90 or missing((received - order)))
group by calculated id, order
;
quit;
@rajd1 wrote:
This is great! Just as a thought if there was a way not to use the threshold value in days...
You can use the INTCK function to computee the distance in time.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.