Hi,
I have two datasets: Dataset 1 is at the person level and each person has a corresponding date. There are also a number of person-level categorical variables that I'd like to keep as is if possible. In Dataset 2 each person occurs multiple times and has multiple different dates.
Dataset1:
ID1 Date1 var1 var2....etc.
1 10/31/2014 abc def
2 9/7/2013 xyz abc
Dataset2:
ID2 Date2
1 9/7/2012
1 2/4/2013
1 10/1/2014
2 9/7/2012
2 10/2/2014
I'm trying to merge the datasets such that from Dataset 2, I only bring in one record for each person--the record that has the Date2 value that remains true as of that person's Date1 in Dataset1. So for example, ID=1 has a Date1 value of 10/31/2014. I would want the latest Date2 value for ID=1 that occurs before Date1--in this case, 10/1/2014.
Any help is much appreciated.
Hi mate,
If you need simply join these tables where the dataset2 remains only the last dates of each id you can simply do this:
proc sort data=dataset1(rename=id1=id date1=date);
by id date;
run;
proc sort data=dataset2(rename=id2=id date2=date);
by id date;
run;
data dt1;
set dataset1;
by id date;
if last.date then flg = 1;
else flg = 0;
run;
data dt2;
set dataset2;
by id date;
if last.date then flg = 1;
else flg = 0;
run
data merge;
merge dt1(in=a)
dt2(in=b);
by id;
if a;
where flg = 1;
run;
Maybe this helps
Simple proc sql will do as well but you need to list all variables from dataset1 in group by statement:
proc sql noprint;
create table ds3 as
select a.*, max(b.date2) as date3 format=mmddyy10.
from ds1 as a left join ds2 as b
on a.id1=b.id2 and b.date2<=a.date1
group by a.id1, a.date1, a.var1, a.var2;
quit;
It's probably easier to reorder the records in DATASET2. That way, you can take the first one that meets the conditions for comparing the dates:
proc sort data=dataset2;
by id descending date2;
run;
data want;
merge dataset1 dataset2;
by id;
if date1 >= date2 then do;
output;
date1 = '01jan1800'd;
end;
run;
This assumes you will never have any dates that fall before January 1, 1800 (and assumes that I can still think straight after pondering this problem). One of the keys is that DATE1 will be retained, and not re-read from DATASET1.
Good luck.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.