BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kastchei
Pyrite | Level 9

Hey all,

It looks like the intervals, as defined by the SAS documentation, for the counting process style of Cox regression with a time-varying covariate is

(t1, t2]

For measures like weight, which continuously vary, do folks really code it this way, or do most of you code [t1 , t2)?

In my case, I'm looking at weight's effect on an adverse event to a drug.  I have weights are various days, depending on when the subject could come visit.  Here's an example.

DayWeight (kg)
055.52477493
5654.88467677
14054.65788059
22956.24545388
35951.48273400
37253.07030729

Using (t1, t2] means that the first measurement (the measurement when drug first used) is never used in the analysis.  Using [t1, t2) uses that data, but then the intervals do not correspond correctly with the model.  There are some subjects who never were remeasured after Day 0 (perhaps they dropped out of the study sometime after without coming back to the clinic).  I don't really want to have to remove them completely from analysis, which is what would happen if I use (t1, t2].

Secondly, I want to dichotomize this.

DayWeight (kg)Weight Category
055.52477493>  55 kg
5654.88467677<= 55 kg
14054.65788059<= 55 kg
22956.24545388>  55 kg
35951.48273400<= 55 kg
37253.07030729<= 55 kg

Using (t1, t2] would put all the experience incorrectly in the same category.  For example, I'm pretty sure on Day 1, she was still > 55 kg; however, (t1, t2] would put Day 1 as <= 55 kg.  Similary, [t2, t2) would have a similar but opposite problem.  I'm sure on Day 55, she's <= 55 kg, but [t1, t2) would still place her in > 55 kg.  Would it be advisable to interpolate the day where she would change from one category to the next?  For my example, around Day 46 (45.91) would be her first change.  So then I could categorize (0, 46] as > 55 kg and then (46, whatever] as <= 55 kg.

Thanks for any tips!

1 ACCEPTED SOLUTION

Accepted Solutions
JacobSimonsen
Barite | Level 11

Hi Kastchei,

The weight observed at the first timepoint is used until the second timepoint. That is more easy to see when you have made a table where the exit point is added:

entryexitweightcategory
05655.52477493>  55 kg
5614054.88467677<= 55 kg
14022954.65788059<= 55 kg
22935956.24545388>  55 kg
35937251.48273400<= 55 kg

In your example, if the last time, which is either an event-time or censoring time is observed at 372, then the weight at that time is not used.

Alternative, If your are very ambitious, you can smooth out the weights between the observed timepoints, but it well require the assumption that weight is not affected by eventtimes, because otherwise you will conditioning on future events.

View solution in original post

5 REPLIES 5
JacobSimonsen
Barite | Level 11

Hi Kastchei,

The weight observed at the first timepoint is used until the second timepoint. That is more easy to see when you have made a table where the exit point is added:

entryexitweightcategory
05655.52477493>  55 kg
5614054.88467677<= 55 kg
14022954.65788059<= 55 kg
22935956.24545388>  55 kg
35937251.48273400<= 55 kg

In your example, if the last time, which is either an event-time or censoring time is observed at 372, then the weight at that time is not used.

Alternative, If your are very ambitious, you can smooth out the weights between the observed timepoints, but it well require the assumption that weight is not affected by eventtimes, because otherwise you will conditioning on future events.

Kastchei
Pyrite | Level 9

Thanks!  I think I had gotten myself a little confused with event times vs. the covariate times.  Indeed, I have to do it like you say, or else some events will get dropped as having a missing covariate (there was no weight taken at the hospital when the event occurred).  Thanks for straightening me out.

Could you explain how you would suggest smoothing out the weights between time points?

Michael

JacobSimonsen
Barite | Level 11

I was thinking that the weight used in the model is the weight that a person have it the weight goes linear. That is, a weighted average of the measured weights at the endpoint of an interval, with most weight to the nearest endpoint.

if you in the table above add two variables t1 and t2 which is the same as entry and exit (you need to copy them, because the exit variale is used for the running time in the Cox regression). also add "weight1" as the weight measured at the left endpoint and "weight2" the weight at the right endpoint.

Further, you need the event-variable that should be 0 at all those intervals that where the person is censored at the right endpoint. It will only take the value 1 at the last interval, and that only in case the person has an event at all.

Something like this should Work:

proc phreg data=mydata;

  weight=((exit-t1)*weight1+(t2-exit)*weight2)/(t2-t1);

  model (entry exit)*event(0)=weight;

run;

or if you will dichotomise:

proc phreg data=mydata;

  weight=( ( (exit-t1)*weight1+(t2-exit)*weight2)/(t2-t1)>55);

  model (entry exit)*event(0)=weight;

run;

Kastchei
Pyrite | Level 9

This is great!  It looks much simpler than what I was trying to come up with, where I was trying to build all those interpolations into new records for each subject (tedious and lots of new intervals!).

I think I finally am starting to understand how the programming steps work as well.  Can you confirm if I have this correct?  In your example, the model statement defines which variables have the interval endpoints; here it's (entry, exit].  As phreg processes, it's only looking at event times (the value of exit when event = 1) in order from smallest to largest, not every single moment in time or every single entry and exit.  At any given event time, only records where the event time in question is in the interval (entry, exit] are processed.  All others are ignored since they are not applicable at that event time.  A record could never be used if no event occurs during it's interval?

The programming steps then are only applied to the applicable records, not all the records.  This ensures that for an applicable record, the event time is somewhere in the middle of the interval, which is what makes your weight= statement equal a linear interpolation, and not some extrapolation to outside the interval.  If the record is the event itself, then we'll select weight2 since (t2 - exit = 0).  In your programming statement, exit does not refer to the end of the interval, varying for each record, but exit refers to the event time currently being processed.

If I have that correct, it seems a little confusing that exit is both used to indicate the currently evaluated event time and also the endpoint for the interval.  I understand that the event time is the interval endpoint for event records, but I would have thought SAS would have come up with a reserved variable name, like _event_, for that purpose.

Also, I think I want to switch the weights around, right?  If event time is 75% the way to weight2, I want to apply that 75% to weight2 (exit - t1) rather than to weight1.  Right?  ((t2-exit)*weight1+(exit-t1)*weight2)/(t2-t1)

Thanks a ton!

JacobSimonsen
Barite | Level 11

It is correct that only timeintervals where an event happens is included in the analysis. That should be understood in the way, that if some person have an event at some time, then other persons interval at that time do matter, because they were at-risk at that time.

The programming steps are applied to all records where an event-times occur. That means, the programming steps are applied multiple times to each record, and a number of times proportional to N^2, (or rather N x number-of-events).

Event-times are where events occurs. Intervals-endpoints can be either censored or non-censored (equivalent to non-event and event). Typically, all intervals except the last one will be non-events (censored).

About the weighted average, I agree on your statement. I wrote it wrong, you did it right!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2226 views
  • 6 likes
  • 2 in conversation