BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasgyro
Fluorite | Level 6

I need to run a survival analysis on my data that contains:

Patients stratified by a characteristic and treatment, I want to test if patients survive longer on a specific treatment when they have this characteristic. However, I have a lot of raw data and confounding variables that change over time (Blood pressure, med dosages, etc). I have all of these variable numbers for each patient logged at various points in time during the trial.

 

I want to know what is the optimal way to organize the data so I can run the simplest COX regression analysis on it (if that is even the test I should use)

 

If clarification is needed please let me know

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
sshetter
Fluorite | Level 6

I would recommend that you look into setting up your data as a 'counting process' so that your time varying variables can be readily incorporated. It will have you set up multiple records per person. The brief notes I copied below are from the SurveyPHREG notes (which I haven't even used since I tend to use PHREG) but it is a good, brief description to start you off on this road. It is more 'data work' up front but I much prefer it to the internal programming that can also be done for time-varying covariates in PHREG since I can do more data checks and make sure the set up is appropriate.

**COPIED TEXT
Counting Process Style of Input
In the counting process formulation, data for each subject are identified by a triple of counting, at-risk, and covariate processes. indicates the sum of weights for all events that the subject experiences over the time interval , indicates whether the subject is at risk at time t (1 if at risk and 0 otherwise), and is a vector of explanatory variables for the subject at time t. The sample path of N is a step function with jumps at the event times, and . Unless changes continuously with time, the data for each subject can be represented by multiple observations, each of which identifies by a semiclosed time interval , the values of the explanatory variables over that interval, and the event status at . The subject remains at risk during the interval , and an event might occur at . Values of the explanatory variables for the subject remain unchanged in the interval. This style of data input was originated by Therneau (1994).

For example, suppose a patient (ID=1) with an analysis weight of 10 has a tumor recurrence at weeks 3, 10, and 15 and is followed up until week 23. Consider three fixed explanatory variables Trt (treatment), Number (initial tumor number), and Size (initial tumor size), one weight variable Weight (analysis weight), one patient identification variable ID, and one time-dependent covariate Z that represents a hormone level. The value of Z might change during the follow-up period. The data for this patient are represented by the following four observations:

 

sshetter_0-1594652626324.png

 

 

 


Here (T1,T2] contains the at-risk intervals. The variable Status indicates whether a recurrence has occurred at T2: a value of 1 indicates a tumor recurrence, and a value of 0 indicates non-recurrence. Assume the patients are selected independently. Because there are multiple observation rows for every patient, you should use the CLUSTER statement to identify each individual patient. The CLUSTER statement computes the variability between the patients. The following statements fit a multiplicative hazards model with baseline covariates Trt, Number, and Size, and a time-varying covariate Z. For more information, see the section The Multiplicative Hazards Model.

proc surveyphreg;
weight Weight;
cluster ID;
model (T1,T2) * Status(0) = Trt Number Size Z;
run;

View solution in original post

1 REPLY 1
sshetter
Fluorite | Level 6

I would recommend that you look into setting up your data as a 'counting process' so that your time varying variables can be readily incorporated. It will have you set up multiple records per person. The brief notes I copied below are from the SurveyPHREG notes (which I haven't even used since I tend to use PHREG) but it is a good, brief description to start you off on this road. It is more 'data work' up front but I much prefer it to the internal programming that can also be done for time-varying covariates in PHREG since I can do more data checks and make sure the set up is appropriate.

**COPIED TEXT
Counting Process Style of Input
In the counting process formulation, data for each subject are identified by a triple of counting, at-risk, and covariate processes. indicates the sum of weights for all events that the subject experiences over the time interval , indicates whether the subject is at risk at time t (1 if at risk and 0 otherwise), and is a vector of explanatory variables for the subject at time t. The sample path of N is a step function with jumps at the event times, and . Unless changes continuously with time, the data for each subject can be represented by multiple observations, each of which identifies by a semiclosed time interval , the values of the explanatory variables over that interval, and the event status at . The subject remains at risk during the interval , and an event might occur at . Values of the explanatory variables for the subject remain unchanged in the interval. This style of data input was originated by Therneau (1994).

For example, suppose a patient (ID=1) with an analysis weight of 10 has a tumor recurrence at weeks 3, 10, and 15 and is followed up until week 23. Consider three fixed explanatory variables Trt (treatment), Number (initial tumor number), and Size (initial tumor size), one weight variable Weight (analysis weight), one patient identification variable ID, and one time-dependent covariate Z that represents a hormone level. The value of Z might change during the follow-up period. The data for this patient are represented by the following four observations:

 

sshetter_0-1594652626324.png

 

 

 


Here (T1,T2] contains the at-risk intervals. The variable Status indicates whether a recurrence has occurred at T2: a value of 1 indicates a tumor recurrence, and a value of 0 indicates non-recurrence. Assume the patients are selected independently. Because there are multiple observation rows for every patient, you should use the CLUSTER statement to identify each individual patient. The CLUSTER statement computes the variability between the patients. The following statements fit a multiplicative hazards model with baseline covariates Trt, Number, and Size, and a time-varying covariate Z. For more information, see the section The Multiplicative Hazards Model.

proc surveyphreg;
weight Weight;
cluster ID;
model (T1,T2) * Status(0) = Trt Number Size Z;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1587 views
  • 2 likes
  • 2 in conversation