BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
TomHsiung
Quartz | Level 8

Hello, everyone

 

It is not uncommon that we have to control more than one time-dependent variable in a Cox regression model. A SAS support PDF document teaches two methods to code for a time-dependent Cox model, but, with only one time-dependent variable adjusted (link: https://support.sas.com/resources/papers/proceedings12/168-2012.pdf).

 

The first method involves constructing a special data set for the time-dependent variable data and the example is for only one time-dependent variable. It does not teach what to do if there is more than one such variable.

 

The second method is more advanced, termed "programming statment method". It has only one record for each individual, compared with the first method which has multiple records for each individual.

 

I wonder how to code for the Cox model if there are 2 or more time-dependent variables, by both method 1 and 2. Thank you very much.

 

Tom

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@TomHsiung wrote:

Thank you for your suggestion. (...) If there is an individual who was followed for 14 days, for whom the A changed on day 7 from 0 to 1. In addition, the B changed on day 4 from 1 to 0. According to my understanding of your idea, there should be three rows for this individual in the overall table and they are:

 

row one: A=0, B=1, start=0, stop=4

row two: A=0, B=0, start=4, stop=7

row three A=1, B=0, start=7, stop=14


You're welcome.

 

In a situation with discrete (integer) times t1, t2, ..., as in your example, one has to make sure that the time-varying variables at time ti have the values that are relevant for the case that ti is the event time of the individual they describe. This is illustrated in Example 92.7 Time-Dependent Repeated Measurements of a Covariate of the PROC PHREG documentation: A value measured at time ti is assumed to be valid in the entire semiclosed interval (ti-1, ti] where ti-1 is the time of the previous measurement (or zero if i=1). Time ti-1 must not be equal to ti in this situation. Otherwise, PROC PHREG would discard the observation and issue a note about this in the log.

 

So, if you know that "B changed on day 4 from 1 to 0," (and not: "it may have changed earlier, but the first measurement detecting the change happened to be that on day 4") I think it would be more appropriate to have a time interval with stop time 3 and B=1, followed by an interval with start time 3 and B=0. Similarly, knowing that "A changed on day 7 from 0 to 1," the latter interval would rather have stop time 6 and A=0.

start    stop    A    B

  0        3     0    1
  3        6     0    0
  6       14     1    0

If the event or censoring time of that individual was day 7, the third observation would have stop=7 (and the corresponding value of the variable indicating event or censoring, not shown above). Thus, the model would take the potential impact of A=1 on the occurrence probability of the event into account since start=6<7=stop.

View solution in original post

7 REPLIES 7
FreelanceReinh
Jade | Level 19

Hello @TomHsiung,

 

I would think of the two or more time-dependent explanatory variables as one vector-valued variable and then use either of the two methods from the paper (or even both, for validation purposes) to build the model.

 

With counting process style of input: Create an input dataset with one observation per individual and time interval (start, stop] where the combination of all time-dependent explanatory variables (i.e., the "vector") is constant, until one or more of the components of the vector change or a change of status (event or censoring) occurs.

 

Using programming statements: Assign each component of the vector its time-dependent value (using one or more arrays or IF-THEN/ELSE statements or whatever is appropriate).

 

Which of the two methods is more convenient, depends on the structure and other characteristics of your data.

TomHsiung
Quartz | Level 8

Hi, @FreelanceReinh 

 

Thank you for your suggestion. Hmmm, it's a great idea to deal with multiple time-dependent variables. Please let me reproduce your idea.

 

First, via the counting process method. Say, we have two time-varying variables named A and B, and there are other fixed variables which is represented in union by U. If there is an individual who was followed for 14 days, for whom the A changed on day 7 from 0 to 1. In addition, the B changed on day 4 from 1 to 0. According to my understanding of your idea, there should be three rows for this individual in the overall table and they are:

 

row one: A=0, B=1, start=0, stop=4

row two: A=0, B=0, start=4, stop=7

row three A=1, B=0, start=7, stop=14

 

Therefore, every change in the time-varying variable produces an extra row for the individual, given that the time-varying variables do not change on the same day (i.e., tie).

 

I have not yet thought about the programming method and if I have ideas about it I would update this post. Again, thank you for your feedback.

 

Tom

FreelanceReinh
Jade | Level 19

@TomHsiung wrote:

Thank you for your suggestion. (...) If there is an individual who was followed for 14 days, for whom the A changed on day 7 from 0 to 1. In addition, the B changed on day 4 from 1 to 0. According to my understanding of your idea, there should be three rows for this individual in the overall table and they are:

 

row one: A=0, B=1, start=0, stop=4

row two: A=0, B=0, start=4, stop=7

row three A=1, B=0, start=7, stop=14


You're welcome.

 

In a situation with discrete (integer) times t1, t2, ..., as in your example, one has to make sure that the time-varying variables at time ti have the values that are relevant for the case that ti is the event time of the individual they describe. This is illustrated in Example 92.7 Time-Dependent Repeated Measurements of a Covariate of the PROC PHREG documentation: A value measured at time ti is assumed to be valid in the entire semiclosed interval (ti-1, ti] where ti-1 is the time of the previous measurement (or zero if i=1). Time ti-1 must not be equal to ti in this situation. Otherwise, PROC PHREG would discard the observation and issue a note about this in the log.

 

So, if you know that "B changed on day 4 from 1 to 0," (and not: "it may have changed earlier, but the first measurement detecting the change happened to be that on day 4") I think it would be more appropriate to have a time interval with stop time 3 and B=1, followed by an interval with start time 3 and B=0. Similarly, knowing that "A changed on day 7 from 0 to 1," the latter interval would rather have stop time 6 and A=0.

start    stop    A    B

  0        3     0    1
  3        6     0    0
  6       14     1    0

If the event or censoring time of that individual was day 7, the third observation would have stop=7 (and the corresponding value of the variable indicating event or censoring, not shown above). Thus, the model would take the potential impact of A=1 on the occurrence probability of the event into account since start=6<7=stop.

TomHsiung
Quartz | Level 8

Thanks for the notice, @FreelanceReinh 

 

Great! And I have one more question. If the two time-varying variables change on the same day (tie), how do we count them? Only two rows with a same start and stop time? Thank you.

FreelanceReinh
Jade | Level 19

@TomHsiung wrote:

If the two time-varying variables change on the same day (tie), how do we count them? Only two rows with a same start and stop time?


The general pattern is always the same: Each row represents a semiclosed time interval (start, stop] in which the time-varying variables are constant. If in the previous example not only B changed on day 4 from 1 to 0, but also A from 0 to 1 (and remained constant thereafter), we would specify:

start    stop    A    B

  0        3     0    1
  3       14     1    0

So, up to and including day 3 the "vector" (A, B)=(0, 1), whereas after day 3, i.e., on days 4, 5, ..., 14, (A, B)=(1, 0).

TomHsiung
Quartz | Level 8

@FreelanceReinh Thanks for your further explanation. I guess if we have more than 3 time-varying variables, this approach would be very laborious. I addition, the PROC TRANSPOSE might experience difficulty when transferring a wide dataset to a narrow dataset, given there is more than one time-varying variables (e.g., A_wk1, A_wk2, ... A_wkm, and B_wk1, B_wk2, ... B_wkn).

FreelanceReinh
Jade | Level 19

@TomHsiung: Sorry for the delayed reply, I was out of the office for a week.

 


@TomHsiung wrote:

I guess if we have more than 3 time-varying variables, this approach would be very laborious.


I don't think I've ever had that many time-varying variables in a Cox model. But since the counting process style of input follows a general pattern -- a change in one of the time-varying variables calls for a new observation in the input dataset -- it should be possible to use DATA step programming logic to create all those observations. See the recent post Re: Counting process time dependent cox model for an example (for discrete times).

 


@TomHsiung wrote:

I addition, the PROC TRANSPOSE might experience difficulty when transferring a wide dataset to a narrow dataset, given there is more than one time-varying variables (e.g., A_wk1, A_wk2, ... A_wkm, and B_wk1, B_wk2, ... B_wkn).


Data transformations from wide to long (and vice versa) have been discussed many times in the SAS Support Communities: please see the search results https://communities.sas.com/t5/forums/searchpage/tab/message?q=%22wide%20to%20long%22&noSynonym=fals... or open a new thread describing your specific problem.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 815 views
  • 2 likes
  • 2 in conversation