Hi,
I have two datasets: Dataset 1 is at the person level and each person has a corresponding date. There are also a number of person-level categorical variables that I'd like to keep as is if possible. In Dataset 2 each person occurs multiple times and has multiple different dates.
Dataset1:
ID1 Date1 var1 var2....etc.
1 10/31/2014 abc def
2 9/7/2013 xyz abc
Dataset2:
ID2 Date2
1 9/7/2012
1 2/4/2013
1 10/1/2014
2 9/7/2012
2 10/2/2014
I'm trying to merge the datasets such that from Dataset 2, I only bring in one record for each person--the record that has the Date2 value that remains true as of that person's Date1 in Dataset1. So for example, ID=1 has a Date1 value of 10/31/2014. I would want the latest Date2 value for ID=1 that occurs before Date1--in this case, 10/1/2014.
Any help is much appreciated.
Hi mate,
If you need simply join these tables where the dataset2 remains only the last dates of each id you can simply do this:
proc sort data=dataset1(rename=id1=id date1=date);
by id date;
run;
proc sort data=dataset2(rename=id2=id date2=date);
by id date;
run;
data dt1;
set dataset1;
by id date;
if last.date then flg = 1;
else flg = 0;
run;
data dt2;
set dataset2;
by id date;
if last.date then flg = 1;
else flg = 0;
run
data merge;
merge dt1(in=a)
dt2(in=b);
by id;
if a;
where flg = 1;
run;
Maybe this helps
Simple proc sql will do as well but you need to list all variables from dataset1 in group by statement:
proc sql noprint;
create table ds3 as
select a.*, max(b.date2) as date3 format=mmddyy10.
from ds1 as a left join ds2 as b
on a.id1=b.id2 and b.date2<=a.date1
group by a.id1, a.date1, a.var1, a.var2;
quit;
It's probably easier to reorder the records in DATASET2. That way, you can take the first one that meets the conditions for comparing the dates:
proc sort data=dataset2;
by id descending date2;
run;
data want;
merge dataset1 dataset2;
by id;
if date1 >= date2 then do;
output;
date1 = '01jan1800'd;
end;
run;
This assumes you will never have any dates that fall before January 1, 1800 (and assumes that I can still think straight after pondering this problem). One of the keys is that DATE1 will be retained, and not re-read from DATASET1.
Good luck.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.