DATA Step, Macro, Functions and more

Joining two datasets on date that's true as of the date in the second dataset

Reply
Frequent Contributor
Posts: 138

Joining two datasets on date that's true as of the date in the second dataset

Hi,

 

I have two datasets: Dataset 1 is at the person level and each person has a corresponding date. There are also a number of person-level categorical variables that I'd like to keep as is if possible. In Dataset 2 each person occurs multiple times and has multiple different dates. 

 

Dataset1:

ID1     Date1          var1     var2....etc.

1      10/31/2014     abc      def

2      9/7/2013        xyz       abc

 

Dataset2:

ID2      Date2

1        9/7/2012

1        2/4/2013

1       10/1/2014

2        9/7/2012

2       10/2/2014

 

I'm trying to merge the datasets such that from Dataset 2, I only bring in one record for each person--the record that has the Date2 value that remains true as of that person's Date1 in Dataset1. So for example, ID=1 has a Date1 value of 10/31/2014. I would want the latest Date2 value for ID=1 that occurs before Date1--in this case, 10/1/2014. 

 

Any help is much appreciated.

 

Regular Contributor
Posts: 212

Re: Joining two datasets on date that's true as of the date in the second dataset

Hi mate,

 

If you need simply join these tables where the dataset2  remains only the last dates of each id you can simply do this:

 

 

proc sort data=dataset1(rename=id1=id date1=date);
  by id date;
run;

proc sort data=dataset2(rename=id2=id date2=date);
  by id date;
run;

data dt1;
set dataset1;
by id date;
if last.date then flg = 1;
else flg = 0;
run;
data dt2;
set dataset2;
by id date;
if last.date then flg = 1;
else flg = 0;
run

data merge;
merge dt1(in=a)
dt2(in=b);
by id;
if a;
where flg = 1;
run;

Maybe this helps

 

Contributor ndp
Contributor
Posts: 61

Re: Joining two datasets on date that's true as of the date in the second dataset

Simple proc sql will do as well but you need to list all variables from dataset1 in group by statement:

 

proc sql noprint;
	create table ds3 as
	select a.*, max(b.date2) as date3 format=mmddyy10.
	from ds1 as a left join ds2 as b
	on a.id1=b.id2 and b.date2<=a.date1
	group by a.id1, a.date1, a.var1, a.var2;
quit;
Super User
Posts: 5,083

Re: Joining two datasets on date that's true as of the date in the second dataset

It's probably easier to reorder the records in DATASET2.  That way, you can take the first one that meets the conditions for comparing the dates:

 

proc sort data=dataset2;

   by id descending date2;

run;

 

data want;

merge dataset1 dataset2;

by id;

if date1 >= date2 then do;

   output;

   date1 = '01jan1800'd;

end;

run;

 

This assumes you will never have any dates that fall before January 1, 1800 (and assumes that I can still think straight after pondering this problem).  One of the keys is that DATE1 will be retained, and not re-read from DATASET1.

 

Good luck.

Ask a Question
Discussion stats
  • 3 replies
  • 193 views
  • 0 likes
  • 4 in conversation