03-21-2016 10:37 AM
I have a dataset at the person level that has ID and yearly values of a secondary ID that link to Dataset 2:
ID Dataset2_ID_2009 Dataset2_ID_2010 Dataset2_ID_2011
1 123 123 234
2 234 456
As you can see, the secondary ID can be missing in a given year, and can either change or stay the same from year to year.
Dataset 2 has the Dataset2 IDs in rows with a start and end date, as well as two categorical vars that are true as of the start and end dates:
Dataset_2_ID Start_dt End_dt Categ_var1 Categ_var2
123 1/3/2000 10/02/2012 abc def
123 2/5/2013 12/01/2013 ghi jkl
234 7/1/2001 9/3/2012 abc def
456 9/3/2001 4/5/2003 ghi jkl
456 8/7/2010 5/17/2014 mno pqr
What I want is to keep the form of Dataset 1 but pull in the value of Categ_var1 and Categ_var2 as of January 1st of the year of Dataset_2_ID (for all years), like this:
ID Dataset2_ID_2009 Categ_var_1_2009 Categ_var_2_2009 Dataset2_ID_2010 Categ_var_1_2010 Categ_var_2_2010
1 123 abc def 123 abc def
(first row continued)
Dataset2_ID_2011 Categ_var_1_2011 Categ_var_2_2011
234 abc def
I know how to do this the long way by splitting Dataset 1 into yearly cohorts and doing a join for each year, but I'm hoping that there is a way to do it with fewer steps.
Any help is much appreciated.
03-21-2016 11:06 AM
03-21-2016 11:19 AM
I agree with @LinusH, your first dataset looks odd. I haven't time as leaving now, but re-arrange your code so that "data" items are in columns, this will make your life much easier:
ID YEAR LINK
1 2009 123
1 2010 123
Its never a good idea to put "data" in column names.
03-21-2016 12:53 PM
You could transpose the first data set, join, and then re-transpose. The following code is not complete but should get you going in the right direction.
proc transpose data=work.one out=transone;
var Dataset2_ID_2009 Dataset2_ID_2010 Dataset2_ID_2011;
data transone2 (drop=_name_);
_name_ = substr(right(_name_), 13, 4);
from transone2, two
where year between year(start_dt) and year(end_dt);