BookmarkSubscribeRSS Feed
rosiecarey
Calcite | Level 5

I have a complicated question that I am struggling to describe, let alone figure out how to code.

My data is in long format, i.e., I have a row for every time an event happened to a certain person. YEAR is the year the event occurred. I also have a variable that I will call X, which was measured at different survey waves. For example, X07 was measured in 2007, and X16 was measured in 2016.
I want to be able to create an X_CLOSE variable that has the value to the corresponding X that occurred temporally closest in time to the event. 

Given my example data below, person 01 had an event in 2006, 2009, and 2016. For their 2006 event, I want X_CLOSE to take the value of X07, as 2007 was the closest X measurement to the event. For their 2009 event, X_close should =X10. For their 2016 event, X16 was temporally closest but is missing, so I X_close should =X15. 

How can I code this? I have more than 50 thousand rows of data, 30 outcome years, and 12 X years, so it is not practical to manually code.
My challenge is that the X data is coming from forwards or backward in time. If I only wanted X measurements prior to the outcome, I would use an array to carry the last observation forwards and use only Xs prior to the event. 

 

DATA have;
	INPUT ID YEAR X07 X10 X13 X14 X15 X16;
	DATALINES;
	01 2006 1 2 1 1 1 .;
	01 2009 1 2 1 1 1 .;
	01 2016 1 2 1 1 1 .;
	02 2008 1 3 3 3 5 5;
	02 2010 1 3 3 3 5 5;
	02 2016 1 3 3 3 5 5;
	03 2018 1 2 1 2 1 3;
RUN;


data want;
	INPUT ID YEAR X_CLOSE X07 X10 X13 X14 X15 X16;
	DATALINES;
	01 2006 1 1 2 1 1 1 .;
	01 2009 2 1 2 1 1 1 .;
	01 2016 1 1 2 1 1 1 .;
	02 2008 1 1 3 3 3 5 5;
	02 2010 3 1 3 3 3 5 5;
	02 2016 5 1 3 3 3 5 5;
	03 2018 3 1 2 1 2 1 3;
run;
2 REPLIES 2
ballardw
Super User

First thing, ensure that your data step code RUNS properly. Since you terminated ever line of the datalines block with a semicolon every line generates errors like this:

135     01 2006 1 2 1 1 1 .;
        --
        180

ERROR 180-322: Statement is not valid or it is used out of proper order.

Remove all the ; in the datalines. You place one ; after the last line of data to terminate the block of data (or 4 semicolons if using the DATALINES4 statement).

 

 

Rules, not examples. What do you want if the "closest" is two values because they are the same distance from the year such as with something like this where the year 2015 is missing but you values for X14 and X16?

	02 2015 1 3 3 3 . 5

State the RULE for selecting the value. If you say "X14" that does not tell use what the response might be when the choice is between X12 and X16 or any other such pair for some reason.

Since you say have 30 X values is there some point at which a single value might be "too far" from the year for actual use? If so what is that difference and what to do?

 

 

Really wonder if X07 is 2007 and X10 is 2010 why there are no X08, X09, etc. Seems like a very FUNKY data structure.

rosiecarey
Calcite | Level 5

First thing, ensure that your data step code RUNS properly

Thank you, I am new to SAS and have not tried to create an example dataset before and didn't realize the ;s would cause errors.

Really wonder if X07 is 2007 and X10 is 2010 why there are no X08, X09, etc. Seems like a very FUNKY data structure.

Yes, one of the many challenges with this data is that the X measurements are *not* evenly spaced. At different points, they have occurred at 1, 2, 3, or 4 year intervals. It is, as you say, very funky.

 

State the RULE for selecting the value. If you say "X14" that does not tell use what the response might be when the choice is between X12 and X16 or any other such pair for some reason.

I would prefer to default to the earlier measurement when two measurements are equidistant from the event. I don't know to articulate this as a rule.

Since you say have 30 X values is there some point at which a single value might be "too far" from the year for actual use? If so what is that difference and what to do?

For now, nothing is too far. In an ideal world, I would be able to create a second variable that indicated how far the X measurement was from the event, but I have been focused on just trying to figure out how to make my main variable. 

 

Thank you, again. 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 638 views
  • 0 likes
  • 2 in conversation