Hi all,
I did quite a bit of research on merging dates by ranges on this forum. Here are my 2 datasets. I want to be able to merge the rows by closest date range.
The order dates (in data set AA) occur before the received dates (in dataset BB)
I also have the code that used below:
Data set AA
ID | Order |
20 | 7/16/2016 |
20 | 10/2/2017 |
30 | 7/17/2016 |
38 | 7/18/2016 |
Data set BB: Some of the IDs received the product on the same date: For example, ID=20 received products EGF and TP on the same date.
ID | name | Received |
20 | EGF | 7/26/2016 |
20 | TP | 7/26/2016 |
20 | TP | 10/22/2017 |
20 | EGF | 10/22/2017 |
30 | TP | 8/1/2016 |
30 | ER | 8/1/2016 |
30 | CDK | 8/1/2016 |
30 | KRA | 8/1/2016 |
38 | KRA | 8/3/2016 |
38 | TP | 8/3/2016 |
38 | CDK | 8/3/2016 |
38 | PIK | 8/3/2016 |
This is the final dataset i would like to have. It has the same rows as the dataset AA.
ID | Order | Received |
20 | 7/16/2016 | 7/26/2016 |
20 | 10/2/2017 | 10/22/2017 |
30 | 7/17/2016 | 8/1/2016 |
38 | 7/18/2016 | 8/3/2016 |
data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;
data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;
PROC SQL;
CREATE TABLE want AS
SELECT
A.*,
B.received,
FROM
aa A
join
bb B
ON
A.ID=B.ID
AND
A.order < B.Received
;
QUIT;
Here i also tried to get a dataset in a different format. Basically have the same rows as dataset BB and have the closest dates
data want;
merge aa(in=a) bb(in=b) ;
by ID;
if order<received then output;
run;
Hi @rajd1 Assuming I understand your description correctly, the solution is pretty straight forward-
data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;
data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;
proc sql;
create table want as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
group by a.id,order
having min(received)=received;
quit;
Hi @rajd1 Assuming I understand your description correctly, the solution is pretty straight forward-
data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;
data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;
proc sql;
create table want as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
group by a.id,order
having min(received)=received;
quit;
If the data sets are sorted by id and date, as they are in your sample, then this is a good situation for conditional SET statements in a DATA step, as in:
data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;
data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;
data want (drop=_date);
set aa (rename=(order=_date) in=inaa)
bb (rename=(received=_date) in=inbb);
by id _date;
if inaa then set aa;
if inbb then set bb;
if lag(inaa)=1 and inbb=1 and first.id=0;
run;
The first SET statement interleaves the two data sets by ID and _DATE (renamed from ORDER and RECEIVED respectively). The conditional SETS re-read AA or BB, but with the order and received variables not renamed. This means their values will not be replaced until another record from the same data set is read, i.e. RECEIVED and ORDER are "retained".
In turn, all you have to do is keep the instance in which the current record is from BB and the immediately preceding record is from AA. The "and first.id=0" condition avoids instances in which one ID ends with an AA record, and the next ID begins with a BB record.
This is a good example of the robustness of the DATA step, requiring much less record comparison than SQL, at least when the data sets are sorted.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.