Hi,
I have a dataset at the person level that has ID and yearly values of a secondary ID that link to Dataset 2:
Dataset 1:
ID Dataset2_ID_2009 Dataset2_ID_2010 Dataset2_ID_2011
1 123 123 234
2 234 456
As you can see, the secondary ID can be missing in a given year, and can either change or stay the same from year to year.
Dataset 2 has the Dataset2 IDs in rows with a start and end date, as well as two categorical vars that are true as of the start and end dates:
Dataset 2:
Dataset_2_ID Start_dt End_dt Categ_var1 Categ_var2
123 1/3/2000 10/02/2012 abc def
123 2/5/2013 12/01/2013 ghi jkl
234 7/1/2001 9/3/2012 abc def
456 9/3/2001 4/5/2003 ghi jkl
456 8/7/2010 5/17/2014 mno pqr
What I want is to keep the form of Dataset 1 but pull in the value of Categ_var1 and Categ_var2 as of January 1st of the year of Dataset_2_ID (for all years), like this:
Final dataset:
ID Dataset2_ID_2009 Categ_var_1_2009 Categ_var_2_2009 Dataset2_ID_2010 Categ_var_1_2010 Categ_var_2_2010
1 123 abc def 123 abc def
(first row continued)
Dataset2_ID_2011 Categ_var_1_2011 Categ_var_2_2011
234 abc def
I know how to do this the long way by splitting Dataset 1 into yearly cohorts and doing a join for each year, but I'm hoping that there is a way to do it with fewer steps.
Any help is much appreciated.
I agree with @LinusH, your first dataset looks odd. I haven't time as leaving now, but re-arrange your code so that "data" items are in columns, this will make your life much easier:
ID YEAR LINK
1 2009 123
1 2010 123
...
Its never a good idea to put "data" in column names.
You could transpose the first data set, join, and then re-transpose. The following code is not complete but should get you going in the right direction.
proc transpose data=work.one out=transone;
id id;
var Dataset2_ID_2009 Dataset2_ID_2010 Dataset2_ID_2011;
run;
data transone2 (drop=_name_);
set transone;
_name_ = substr(right(_name_), 13, 4);
year=input(_name_, 4.);
run;
proc sql;
select *
from transone2, two
where year between year(start_dt) and year(end_dt);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.