BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets: Dataset 1 is at the person level and each person has a corresponding date. There are also a number of person-level categorical variables that I'd like to keep as is if possible. In Dataset 2 each person occurs multiple times and has multiple different dates. 

 

Dataset1:

ID1     Date1          var1     var2....etc.

1      10/31/2014     abc      def

2      9/7/2013        xyz       abc

 

Dataset2:

ID2      Date2

1        9/7/2012

1        2/4/2013

1       10/1/2014

2        9/7/2012

2       10/2/2014

 

I'm trying to merge the datasets such that from Dataset 2, I only bring in one record for each person--the record that has the Date2 value that remains true as of that person's Date1 in Dataset1. So for example, ID=1 has a Date1 value of 10/31/2014. I would want the latest Date2 value for ID=1 that occurs before Date1--in this case, 10/1/2014. 

 

Any help is much appreciated.

 

3 REPLIES 3
DartRodrigo
Lapis Lazuli | Level 10

Hi mate,

 

If you need simply join these tables where the dataset2  remains only the last dates of each id you can simply do this:

 

 

proc sort data=dataset1(rename=id1=id date1=date);
  by id date;
run;

proc sort data=dataset2(rename=id2=id date2=date);
  by id date;
run;

data dt1;
set dataset1;
by id date;
if last.date then flg = 1;
else flg = 0;
run;
data dt2;
set dataset2;
by id date;
if last.date then flg = 1;
else flg = 0;
run

data merge;
merge dt1(in=a)
dt2(in=b);
by id;
if a;
where flg = 1;
run;

Maybe this helps

 

ndp
Quartz | Level 8 ndp
Quartz | Level 8

Simple proc sql will do as well but you need to list all variables from dataset1 in group by statement:

 

proc sql noprint;
	create table ds3 as
	select a.*, max(b.date2) as date3 format=mmddyy10.
	from ds1 as a left join ds2 as b
	on a.id1=b.id2 and b.date2<=a.date1
	group by a.id1, a.date1, a.var1, a.var2;
quit;
Astounding
PROC Star

It's probably easier to reorder the records in DATASET2.  That way, you can take the first one that meets the conditions for comparing the dates:

 

proc sort data=dataset2;

   by id descending date2;

run;

 

data want;

merge dataset1 dataset2;

by id;

if date1 >= date2 then do;

   output;

   date1 = '01jan1800'd;

end;

run;

 

This assumes you will never have any dates that fall before January 1, 1800 (and assumes that I can still think straight after pondering this problem).  One of the keys is that DATE1 will be retained, and not re-read from DATASET1.

 

Good luck.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1709 views
  • 0 likes
  • 4 in conversation