BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rajd1
Quartz | Level 8

Hi all,

I did quite a bit of research on merging dates by ranges on this forum. Here are my 2 datasets. I want to be able to merge the rows by closest date range.

The order dates (in data set AA) occur before the received dates (in dataset BB)

I also have the code that used below:

Data set AA

IDOrder
207/16/2016
2010/2/2017
307/17/2016
387/18/2016

 

Data set BB: Some of the IDs received the product on the same date: For example, ID=20 received products EGF and TP on the same date. 

IDnameReceived
20EGF7/26/2016
20TP7/26/2016
20TP10/22/2017
20EGF10/22/2017
30TP8/1/2016
30ER8/1/2016
30CDK8/1/2016
30KRA8/1/2016
38KRA8/3/2016
38TP8/3/2016
38CDK8/3/2016
38PIK8/3/2016

 

This is the final dataset i would like to have. It has the same rows as the dataset AA. 

IDOrderReceived
207/16/20167/26/2016
2010/2/201710/22/2017
307/17/20168/1/2016
387/18/20168/3/2016

 

data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;

data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;


PROC SQL;
CREATE TABLE want AS
SELECT
A.*,
B.received,
FROM
aa A
join
bb B
ON
A.ID=B.ID
AND
A.order < B.Received
;
QUIT;

 

Here i also tried to get a dataset in a different format. Basically have the same rows as dataset BB and have the closest dates


data want;
merge aa(in=a) bb(in=b) ;
by ID;
if order<received then output;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @rajd1  Assuming I understand your description correctly, the solution is pretty straight forward-


data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;

data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;

proc sql;
create table want  as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
group by a.id,order
having min(received)=received;
quit;

View solution in original post

3 REPLIES 3
novinosrin
Tourmaline | Level 20

Hi @rajd1  Assuming I understand your description correctly, the solution is pretty straight forward-


data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;

data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;

proc sql;
create table want  as
select distinct a.*,b.received
from aa a left join bb b
on a.id=b.id and order<=received
group by a.id,order
having min(received)=received;
quit;
rajd1
Quartz | Level 8
That worked perfectly!! Thanks so much 🙂
mkeintz
PROC Star

If the data sets are sorted by id and date, as they are in your sample, then this is a good situation for conditional SET statements in a DATA step, as in:

 

data aa;
input id order MMDDYY10.;
format order MMDDYY10.;
cards;
20 7/16/2016
20 10/2/2017
30 7/17/2016
38 7/18/2016
;

data bb;
input id name $ received MMDDYY10.;
format received MMDDYY10.;
cards;
20 EG 7/26/2016
20 TP 7/26/2016
20 TP 10/22/2017
20 EG 10/22/2017
30 TP 8/1/2016
30 ER 8/1/2016
30 CD 8/1/2016
30 KR 8/1/2016
38 KR 8/3/2016
38 TP 8/3/2016
38 CD 8/3/2016
38 PI 8/3/2016
;
data want (drop=_date);
  set aa (rename=(order=_date) in=inaa)
      bb (rename=(received=_date) in=inbb);
  by id _date;
  if inaa then set aa;
  if inbb then set bb;
  if lag(inaa)=1 and inbb=1 and first.id=0;
run; 

The first SET statement interleaves the two data sets by ID and _DATE (renamed from ORDER and RECEIVED respectively).  The conditional SETS re-read AA or BB, but with the order and received variables not renamed.  This means their values will not be replaced until another record from the same data set is read, i.e. RECEIVED and ORDER are "retained". 

 

In turn, all you have to do is keep the instance in which the current record is from BB and the immediately preceding record is from AA.  The "and first.id=0" condition avoids instances in which one ID ends with an AA record, and the next ID begins with a BB record.

 

This is a good example of the robustness of the DATA step, requiring much less record comparison than SQL, at least when the data sets are sorted.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1274 views
  • 0 likes
  • 3 in conversation