DATA Step, Macro, Functions and more

Join as of 4 different dates

Frequent Contributor
Posts: 138

Join as of 4 different dates



I have a dataset at the person level that has ID and yearly values of a secondary ID that link to Dataset 2:


Dataset 1:

ID     Dataset2_ID_2009     Dataset2_ID_2010      Dataset2_ID_2011

1            123                                  123                            234

2            234                                                                    456


As you can see, the secondary ID can be missing in a given year, and can either change or stay the same from year to year. 


Dataset 2 has the Dataset2 IDs in rows with a start and end date, as well as two categorical vars that are true as of the start and end dates:


Dataset 2:


Dataset_2_ID       Start_dt         End_dt              Categ_var1          Categ_var2

123                       1/3/2000       10/02/2012             abc                    def

123                       2/5/2013       12/01/2013             ghi                     jkl

234                       7/1/2001       9/3/2012                 abc                    def

456                       9/3/2001       4/5/2003                 ghi                     jkl

456                      8/7/2010       5/17/2014                mno                  pqr


What I want is to keep the form of Dataset 1 but pull in the value of Categ_var1 and Categ_var2 as of January 1st of the year of Dataset_2_ID (for all years), like this:


Final dataset:

ID     Dataset2_ID_2009   Categ_var_1_2009  Categ_var_2_2009    Dataset2_ID_2010   Categ_var_1_2010  Categ_var_2_2010   

1            123                                  abc                        def                            123                              abc                           def

(first row continued)

Dataset2_ID_2011    Categ_var_1_2011         Categ_var_2_2011

234                                     abc                                    def


I know how to do this the long way by splitting Dataset 1 into yearly cohorts and doing a join for each year, but I'm hoping that there is a way to do it with fewer steps. 


Any help is much appreciated. 


Super User
Posts: 5,876

Re: Join as of 4 different dates

Posted in reply to Walternate
Dataset1 has bit odd layout. And I'm a bit disturb about dataset 2 id value 234 repeats between different dataset 1 id's..?
Assuming the your yearly cohorts are defined by the dataset 2 id's in dataset1.
And, having data in wide format almost always complicates stuff. Transpose dataset1 and the it's a matter of a simple join operation.
Data never sleeps
Super User
Super User
Posts: 9,599

Re: Join as of 4 different dates

Posted in reply to Walternate

I agree with @LinusH, your first dataset looks odd.  I haven't time as leaving now, but re-arrange your code so that "data" items are in columns, this will make your life much easier:

ID        YEAR      LINK

1          2009       123

1          2010       123



Its never a good idea to put "data" in column names.

Posts: 67

Re: Join as of 4 different dates

Posted in reply to Walternate

You could transpose the first data set, join, and then re-transpose. The following code is not complete but should get you going in the right direction.


proc transpose out=transone;
id id;
var Dataset2_ID_2009 Dataset2_ID_2010 Dataset2_ID_2011;
data transone2 (drop=_name_);
set transone;
_name_ = substr(right(_name_), 13, 4);
year=input(_name_, 4.);

proc sql;
select *
from transone2, two
where year between year(start_dt) and year(end_dt);

Ask a Question
Discussion stats
  • 3 replies
  • 4 in conversation