BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets: Dataset 1 is at the person level and each person has a corresponding date. There are also a number of person-level categorical variables that I'd like to keep as is if possible. In Dataset 2 each person occurs multiple times and has multiple different dates. 

 

Dataset1:

ID1     Date1          var1     var2....etc.

1      10/31/2014     abc      def

2      9/7/2013        xyz       abc

 

Dataset2:

ID2      Date2

1        9/7/2012

1        2/4/2013

1       10/1/2014

2        9/7/2012

2       10/2/2014

 

I'm trying to merge the datasets such that from Dataset 2, I only bring in one record for each person--the record that has the Date2 value that remains true as of that person's Date1 in Dataset1. So for example, ID=1 has a Date1 value of 10/31/2014. I would want the latest Date2 value for ID=1 that occurs before Date1--in this case, 10/1/2014. 

 

Any help is much appreciated.

 

3 REPLIES 3
DartRodrigo
Lapis Lazuli | Level 10

Hi mate,

 

If you need simply join these tables where the dataset2  remains only the last dates of each id you can simply do this:

 

 

proc sort data=dataset1(rename=id1=id date1=date);
  by id date;
run;

proc sort data=dataset2(rename=id2=id date2=date);
  by id date;
run;

data dt1;
set dataset1;
by id date;
if last.date then flg = 1;
else flg = 0;
run;
data dt2;
set dataset2;
by id date;
if last.date then flg = 1;
else flg = 0;
run

data merge;
merge dt1(in=a)
dt2(in=b);
by id;
if a;
where flg = 1;
run;

Maybe this helps

 

ndp
Quartz | Level 8 ndp
Quartz | Level 8

Simple proc sql will do as well but you need to list all variables from dataset1 in group by statement:

 

proc sql noprint;
	create table ds3 as
	select a.*, max(b.date2) as date3 format=mmddyy10.
	from ds1 as a left join ds2 as b
	on a.id1=b.id2 and b.date2<=a.date1
	group by a.id1, a.date1, a.var1, a.var2;
quit;
Astounding
PROC Star

It's probably easier to reorder the records in DATASET2.  That way, you can take the first one that meets the conditions for comparing the dates:

 

proc sort data=dataset2;

   by id descending date2;

run;

 

data want;

merge dataset1 dataset2;

by id;

if date1 >= date2 then do;

   output;

   date1 = '01jan1800'd;

end;

run;

 

This assumes you will never have any dates that fall before January 1, 1800 (and assumes that I can still think straight after pondering this problem).  One of the keys is that DATE1 will be retained, and not re-read from DATASET1.

 

Good luck.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1112 views
  • 0 likes
  • 4 in conversation