I have a complicated question that I am struggling to describe, let alone figure out how to code. My data is in long format, i.e., I have a row for every time an event happened to a certain person. YEAR is the year the event occurred. I also have a variable that I will call X, which was measured at different survey waves. For example, X07 was measured in 2007, and X16 was measured in 2016. I want to be able to create an X_CLOSE variable that has the value to the corresponding X that occurred temporally closest in time to the event. Given my example data below, person 01 had an event in 2006, 2009, and 2016. For their 2006 event, I want X_CLOSE to take the value of X07, as 2007 was the closest X measurement to the event. For their 2009 event, X_close should =X10. For their 2016 event, X16 was temporally closest but is missing, so I X_close should =X15. How can I code this? I have more than 50 thousand rows of data, 30 outcome years, and 12 X years, so it is not practical to manually code. My challenge is that the X data is coming from forwards or backward in time. If I only wanted X measurements prior to the outcome, I would use an array to carry the last observation forwards and use only Xs prior to the event. DATA have;
INPUT ID YEAR X07 X10 X13 X14 X15 X16;
DATALINES;
01 2006 1 2 1 1 1 .;
01 2009 1 2 1 1 1 .;
01 2016 1 2 1 1 1 .;
02 2008 1 3 3 3 5 5;
02 2010 1 3 3 3 5 5;
02 2016 1 3 3 3 5 5;
03 2018 1 2 1 2 1 3;
RUN;
data want;
INPUT ID YEAR X_CLOSE X07 X10 X13 X14 X15 X16;
DATALINES;
01 2006 1 1 2 1 1 1 .;
01 2009 2 1 2 1 1 1 .;
01 2016 1 1 2 1 1 1 .;
02 2008 1 1 3 3 3 5 5;
02 2010 3 1 3 3 3 5 5;
02 2016 5 1 3 3 3 5 5;
03 2018 3 1 2 1 2 1 3;
run;
... View more