Solved: Date Merging With range

rajd1 · Posted 06-01-2020 08:06 PM

Hi all,

I did quite a bit of research on merging dates by ranges on this forum. Here are my 2 datasets. I want to be able to merge the rows by closest date range.

The order dates (in data set AA) occur before the received dates (in dataset BB)

I also have the code that used below:

Data set AA

ID	Order
20	7/16/2016
20	10/2/2017
30	7/17/2016
38	7/18/2016

Data set BB: Some of the IDs received the product on the same date: For example, ID=20 received products EGF and TP on the same date.

ID	name	Received
20	EGF	7/26/2016
20	TP	7/26/2016
20	TP	10/22/2017
20	EGF	10/22/2017
30	TP	8/1/2016
30	ER	8/1/2016
30	CDK	8/1/2016
30	KRA	8/1/2016
38	KRA	8/3/2016
38	TP	8/3/2016
38	CDK	8/3/2016
38	PIK	8/3/2016

This is the final dataset i would like to have. It has the same rows as the dataset AA.

ID	Order	Received
20	7/16/2016	7/26/2016
20	10/2/2017	10/22/2017
30	7/17/2016	8/1/2016
38	7/18/2016	8/3/2016

data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;

data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;

PROC SQL;
CREATE TABLE want AS
SELECT
A.*,
B.received,
FROM
aa A
join
bb B
ON
A.ID=B.ID
AND
A.order < B.Received
;
QUIT;

Here i also tried to get a dataset in a different format. Basically have the same rows as dataset BB and have the closest dates

data want;
merge aa(in=a) bb(in=b) ;
by ID;
if order<received then output;
run;

novinosrin · Posted 06-01-2020 08:33 PM

Hi @rajd1 Assuming I understand your description correctly, the solution is pretty straight forward-


data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;

data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;

proc sql;
create table want  as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
group by a.id,order
having min(received)=received;
quit;

View solution in original post

novinosrin · Posted 06-01-2020 08:33 PM

Hi @rajd1 Assuming I understand your description correctly, the solution is pretty straight forward-


data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;

data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;

proc sql;
create table want  as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
group by a.id,order
having min(received)=received;
quit;

rajd1 · Posted 06-01-2020 10:26 PM

That worked perfectly!! Thanks so much 🙂

mkeintz · Posted 06-01-2020 10:31 PM

If the data sets are sorted by id and date, as they are in your sample, then this is a good situation for conditional SET statements in a DATA step, as in:

data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;

data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;
data want (drop=_date);
  set aa (rename=(order=_date) in=inaa)
      bb (rename=(received=_date) in=inbb);
  by id _date;
  if inaa then set aa;
  if inbb then set bb;
  if lag(inaa)=1 and inbb=1 and first.id=0;
run;

The first SET statement interleaves the two data sets by ID and _DATE (renamed from ORDER and RECEIVED respectively). The conditional SETS re-read AA or BB, but with the order and received variables not renamed. This means their values will not be replaced until another record from the same data set is read, i.e. RECEIVED and ORDER are "retained".

In turn, all you have to do is keep the instance in which the current record is from BB and the immediately preceding record is from AA. The "and first.id=0" condition avoids instances in which one ID ends with an AA record, and the next ID begins with a BB record.

This is a good example of the robustness of the DATA step, requiring much less record comparison than SQL, at least when the data sets are sorted.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Date Merging With range

Re: Date Merging With range

Re: Date Merging With range

Re: Date Merging With range

Re: Date Merging With range

Date Merging With range

Re: Date Merging With range

Re: Date Merging With range

Re: Date Merging With range

Re: Date Merging With range

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!