Join as of 4 different dates

Walternate · Posted 03-21-2016 10:37 AM

Hi,

I have a dataset at the person level that has ID and yearly values of a secondary ID that link to Dataset 2:

Dataset 1:

ID Dataset2_ID_2009 Dataset2_ID_2010 Dataset2_ID_2011

1 123 123 234

2 234 456

As you can see, the secondary ID can be missing in a given year, and can either change or stay the same from year to year.

Dataset 2 has the Dataset2 IDs in rows with a start and end date, as well as two categorical vars that are true as of the start and end dates:

Dataset 2:

Dataset_2_ID Start_dt End_dt Categ_var1 Categ_var2

123 1/3/2000 10/02/2012 abc def

123 2/5/2013 12/01/2013 ghi jkl

234 7/1/2001 9/3/2012 abc def

456 9/3/2001 4/5/2003 ghi jkl

456 8/7/2010 5/17/2014 mno pqr

What I want is to keep the form of Dataset 1 but pull in the value of Categ_var1 and Categ_var2 as of January 1st of the year of Dataset_2_ID (for all years), like this:

Final dataset:

ID Dataset2_ID_2009 Categ_var_1_2009 Categ_var_2_2009 Dataset2_ID_2010 Categ_var_1_2010 Categ_var_2_2010

1 123 abc def 123 abc def

(first row continued)

Dataset2_ID_2011 Categ_var_1_2011 Categ_var_2_2011

234 abc def

I know how to do this the long way by splitting Dataset 1 into yearly cohorts and doing a join for each year, but I'm hoping that there is a way to do it with fewer steps.

Any help is much appreciated.

LinusH · Posted 03-21-2016 11:06 AM

Dataset1 has bit odd layout. And I'm a bit disturb about dataset 2 id value 234 repeats between different dataset 1 id's..?
Assuming the your yearly cohorts are defined by the dataset 2 id's in dataset1.
And, having data in wide format almost always complicates stuff. Transpose dataset1 and the it's a matter of a simple join operation.

Data never sleeps

RW9 · Posted 03-21-2016 11:19 AM

I agree with @LinusH, your first dataset looks odd. I haven't time as leaving now, but re-arrange your code so that "data" items are in columns, this will make your life much easier:

ID YEAR LINK

1 2009 123

1 2010 123

...

Its never a good idea to put "data" in column names.

mbuchecker · Posted 03-21-2016 12:53 PM

You could transpose the first data set, join, and then re-transpose. The following code is not complete but should get you going in the right direction.

proc transpose data=work.one out=transone;
id id;
var Dataset2_ID_2009 Dataset2_ID_2010 Dataset2_ID_2011;
run;
data transone2 (drop=_name_);
set transone;
_name_ = substr(right(_name_), 13, 4);
year=input(_name_, 4.);
run;

proc sql;
select *
from transone2, two
where year between year(start_dt) and year(end_dt);
run;

Michelle

Join as of 4 different dates

Re: Join as of 4 different dates

Re: Join as of 4 different dates

Re: Join as of 4 different dates

SAS Innovate 2026 Registration is Open

Join as of 4 different dates

Re: Join as of 4 different dates

Re: Join as of 4 different dates

Re: Join as of 4 different dates

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away