BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have a dataset at the person level that has ID and yearly values of a secondary ID that link to Dataset 2:

 

Dataset 1:

ID     Dataset2_ID_2009     Dataset2_ID_2010      Dataset2_ID_2011

1            123                                  123                            234

2            234                                                                    456

 

As you can see, the secondary ID can be missing in a given year, and can either change or stay the same from year to year. 

 

Dataset 2 has the Dataset2 IDs in rows with a start and end date, as well as two categorical vars that are true as of the start and end dates:

 

Dataset 2:

 

Dataset_2_ID       Start_dt         End_dt              Categ_var1          Categ_var2

123                       1/3/2000       10/02/2012             abc                    def

123                       2/5/2013       12/01/2013             ghi                     jkl

234                       7/1/2001       9/3/2012                 abc                    def

456                       9/3/2001       4/5/2003                 ghi                     jkl

456                      8/7/2010       5/17/2014                mno                  pqr

 

What I want is to keep the form of Dataset 1 but pull in the value of Categ_var1 and Categ_var2 as of January 1st of the year of Dataset_2_ID (for all years), like this:

 

Final dataset:

ID     Dataset2_ID_2009   Categ_var_1_2009  Categ_var_2_2009    Dataset2_ID_2010   Categ_var_1_2010  Categ_var_2_2010   

1            123                                  abc                        def                            123                              abc                           def

(first row continued)

Dataset2_ID_2011    Categ_var_1_2011         Categ_var_2_2011

234                                     abc                                    def

 

I know how to do this the long way by splitting Dataset 1 into yearly cohorts and doing a join for each year, but I'm hoping that there is a way to do it with fewer steps. 

 

Any help is much appreciated. 

 

3 REPLIES 3
LinusH
Tourmaline | Level 20
Dataset1 has bit odd layout. And I'm a bit disturb about dataset 2 id value 234 repeats between different dataset 1 id's..?
Assuming the your yearly cohorts are defined by the dataset 2 id's in dataset1.
And, having data in wide format almost always complicates stuff. Transpose dataset1 and the it's a matter of a simple join operation.
Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I agree with @LinusH, your first dataset looks odd.  I haven't time as leaving now, but re-arrange your code so that "data" items are in columns, this will make your life much easier:

ID        YEAR      LINK

1          2009       123

1          2010       123

...

 

Its never a good idea to put "data" in column names.

mbuchecker
Quartz | Level 8

You could transpose the first data set, join, and then re-transpose. The following code is not complete but should get you going in the right direction.

 

proc transpose data=work.one out=transone;
id id;
var Dataset2_ID_2009 Dataset2_ID_2010 Dataset2_ID_2011;
run;
data transone2 (drop=_name_);
set transone;
_name_ = substr(right(_name_), 13, 4);
year=input(_name_, 4.);
run;

proc sql;
select *
from transone2, two
where year between year(start_dt) and year(end_dt);
run;

Michelle

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1409 views
  • 2 likes
  • 4 in conversation