Solved: Re: Cox continuous time-varying intervals

Kastchei · Posted 03-31-2015 12:31 PM

Hey all,

It looks like the intervals, as defined by the SAS documentation, for the counting process style of Cox regression with a time-varying covariate is

(t1, t2]

For measures like weight, which continuously vary, do folks really code it this way, or do most of you code [t1 , t2)?

In my case, I'm looking at weight's effect on an adverse event to a drug. I have weights are various days, depending on when the subject could come visit. Here's an example.

Day	Weight (kg)
0	55.52477493
56	54.88467677
140	54.65788059
229	56.24545388
359	51.48273400
372	53.07030729

Using (t1, t2] means that the first measurement (the measurement when drug first used) is never used in the analysis. Using [t1, t2) uses that data, but then the intervals do not correspond correctly with the model. There are some subjects who never were remeasured after Day 0 (perhaps they dropped out of the study sometime after without coming back to the clinic). I don't really want to have to remove them completely from analysis, which is what would happen if I use (t1, t2].

Secondly, I want to dichotomize this.

Day	Weight (kg)	Weight Category
0	55.52477493	> 55 kg
56	54.88467677	<= 55 kg
140	54.65788059	<= 55 kg
229	56.24545388	> 55 kg
359	51.48273400	<= 55 kg
372	53.07030729	<= 55 kg

Using (t1, t2] would put all the experience incorrectly in the same category. For example, I'm pretty sure on Day 1, she was still > 55 kg; however, (t1, t2] would put Day 1 as <= 55 kg. Similary, [t2, t2) would have a similar but opposite problem. I'm sure on Day 55, she's <= 55 kg, but [t1, t2) would still place her in > 55 kg. Would it be advisable to interpolate the day where she would change from one category to the next? For my example, around Day 46 (45.91) would be her first change. So then I could categorize (0, 46] as > 55 kg and then (46, whatever] as <= 55 kg.

Thanks for any tips!

JacobSimonsen · Posted 04-01-2015 04:33 AM

Hi Kastchei,

The weight observed at the first timepoint is used until the second timepoint. That is more easy to see when you have made a table where the exit point is added:

entry	exit	weight	category
0	56	55.52477493	> 55 kg
56	140	54.88467677	<= 55 kg
140	229	54.65788059	<= 55 kg
229	359	56.24545388	> 55 kg
359	372	51.48273400	<= 55 kg

In your example, if the last time, which is either an event-time or censoring time is observed at 372, then the weight at that time is not used.

Alternative, If your are very ambitious, you can smooth out the weights between the observed timepoints, but it well require the assumption that weight is not affected by eventtimes, because otherwise you will conditioning on future events.

View solution in original post

JacobSimonsen · Posted 04-01-2015 04:33 AM

Hi Kastchei,

The weight observed at the first timepoint is used until the second timepoint. That is more easy to see when you have made a table where the exit point is added:

entry	exit	weight	category
0	56	55.52477493	> 55 kg
56	140	54.88467677	<= 55 kg
140	229	54.65788059	<= 55 kg
229	359	56.24545388	> 55 kg
359	372	51.48273400	<= 55 kg

In your example, if the last time, which is either an event-time or censoring time is observed at 372, then the weight at that time is not used.

Alternative, If your are very ambitious, you can smooth out the weights between the observed timepoints, but it well require the assumption that weight is not affected by eventtimes, because otherwise you will conditioning on future events.

Kastchei · Posted 04-01-2015 10:04 AM

Thanks! I think I had gotten myself a little confused with event times vs. the covariate times. Indeed, I have to do it like you say, or else some events will get dropped as having a missing covariate (there was no weight taken at the hospital when the event occurred). Thanks for straightening me out.

Could you explain how you would suggest smoothing out the weights between time points?

Michael

JacobSimonsen · Posted 04-01-2015 10:17 AM

I was thinking that the weight used in the model is the weight that a person have it the weight goes linear. That is, a weighted average of the measured weights at the endpoint of an interval, with most weight to the nearest endpoint.

if you in the table above add two variables t1 and t2 which is the same as entry and exit (you need to copy them, because the exit variale is used for the running time in the Cox regression). also add "weight1" as the weight measured at the left endpoint and "weight2" the weight at the right endpoint.

Further, you need the event-variable that should be 0 at all those intervals that where the person is censored at the right endpoint. It will only take the value 1 at the last interval, and that only in case the person has an event at all.

Something like this should Work:

proc phreg data=mydata;

weight=((exit-t1)*weight1+(t2-exit)*weight2)/(t2-t1);

model (entry exit)*event(0)=weight;

run;

or if you will dichotomise:

proc phreg data=mydata;

weight=( ( (exit-t1)*weight1+(t2-exit)*weight2)/(t2-t1)>55);

model (entry exit)*event(0)=weight;

run;

Kastchei · Posted 04-01-2015 11:57 AM

This is great! It looks much simpler than what I was trying to come up with, where I was trying to build all those interpolations into new records for each subject (tedious and lots of new intervals!).

I think I finally am starting to understand how the programming steps work as well. Can you confirm if I have this correct? In your example, the model statement defines which variables have the interval endpoints; here it's (entry, exit]. As phreg processes, it's only looking at event times (the value of exit when event = 1) in order from smallest to largest, not every single moment in time or every single entry and exit. At any given event time, only records where the event time in question is in the interval (entry, exit] are processed. All others are ignored since they are not applicable at that event time. A record could never be used if no event occurs during it's interval?

The programming steps then are only applied to the applicable records, not all the records. This ensures that for an applicable record, the event time is somewhere in the middle of the interval, which is what makes your weight= statement equal a linear interpolation, and not some extrapolation to outside the interval. If the record is the event itself, then we'll select weight2 since (t2 - exit = 0). In your programming statement, exit does not refer to the end of the interval, varying for each record, but exit refers to the event time currently being processed.

If I have that correct, it seems a little confusing that exit is both used to indicate the currently evaluated event time and also the endpoint for the interval. I understand that the event time is the interval endpoint for event records, but I would have thought SAS would have come up with a reserved variable name, like _event_, for that purpose.

Also, I think I want to switch the weights around, right? If event time is 75% the way to weight2, I want to apply that 75% to weight2 (exit - t1) rather than to weight1. Right? ((t2-exit)*weight1+(exit-t1)*weight2)/(t2-t1)

Thanks a ton!

JacobSimonsen · Posted 04-01-2015 12:55 PM

It is correct that only timeintervals where an event happens is included in the analysis. That should be understood in the way, that if some person have an event at some time, then other persons interval at that time do matter, because they were at-risk at that time.

The programming steps are applied to all records where an event-times occur. That means, the programming steps are applied multiple times to each record, and a number of times proportional to N^2, (or rather N x number-of-events).

Event-times are where events occurs. Intervals-endpoints can be either censored or non-censored (equivalent to non-event and event). Typically, all intervals except the last one will be non-events (censored).

About the weighted average, I agree on your statement. I wrote it wrong, you did it right!