BookmarkSubscribeRSS Feed
TomHsiung
Lapis Lazuli | Level 10

Hello, guys. I have encountered an issue that I had never faced before. As my statistical knowledge grows, the data analysis task also gets more complicated. Here is my question: For a time-dependent Cox regression, we would like to use the IPTW for the control of time-fixed confounders, wile leaving the intervention of interest and the time-varying confounders only in the time-dependent Cox model. However, what we are uncertain is, when should we use the logistic model to get the weight values for each row? The most raw data where each individual is a row, or, the rephrased dataset where multiple rows are from a same individual but during different follow-up intervals. Thanks

 

The Gemini feedback this:

You should use the logistic model to calculate the IPT weights on the most raw data where each individual is a single row, provided your time-fixed confounders are truly constant over the entire follow-up period.

The reason for this lies in the goal of IPTW for time-fixed confounding: to balance the initial probability of treatment assignment based on baseline (time-fixed) characteristics.

Here is the breakdown of why and how to handle the data structure:


 

Why Use the Raw, Single-Row Dataset

 

The Inverse Probability of Treatment Weighting (IPTW) method, when controlling only for time-fixed confounders, requires modeling the probability of receiving a particular intervention based on the individual's baseline characteristics.

  1. Time-Fixed Treatment: Your intervention of interest is also time-fixed (though its effect is time-dependent). The weights need to reflect the initial treatment decision.

  2. Time-Fixed Confounders: Since your confounders are time-fixed (e.g., age at baseline, sex, disease stage at entry), the propensity score (PS) for an individual remains constant throughout their follow-up.

  3. Simplicity and Consistency: Calculating the PS once per individual on the raw dataset ensures that:

    • The PS calculation reflects the baseline characteristics that drove the initial treatment assignment.

    • Every interval (every row) of that individual in the rephrased dataset will receive the exact same weight, preserving the marginal treatment effect estimate.

 

Correct Procedure (Using Raw Data)

 

  1. Data Structure: Use the dataset where each row is a unique individual, containing the binary treatment status and all baseline (time-fixed) confounders.

  2. Model: Fit a standard logistic regression model for the probability of receiving the intervention.

    ( are your time-fixed confounders).

  3. Calculate Weights (😞 Calculate the ATE or ATT weights for each individual.

  4. Merge: Merge the calculated weight () onto the rephrased dataset using the individual ID. Each row/interval for the same person receives the identical weight.

  5. Model: Run the time-dependent Cox model using the rephrased data, including the intervention of interest, time-varying confounders, and the calculated weights in the WEIGHT statement.


 

Why You Should AVOID the Rephrased Dataset for Weights

 

If you use the rephrased (interval-based) dataset for the logistic regression:

  1. Redundancy: You would be running the same regression (treatment on time-fixed covariates) thousands of times, but the only variation would be the time interval and censoring status, which are not used in the propensity score calculation.

  2. Incorrect Interpretation: A logistic regression on interval data is generally used for time-varying propensity scores, where you model the probability of treatment at time t conditional on time-varying confounders. Since your confounders are time-fixed, this introduces unnecessary complexity and potential confusion without adding statistical value.

Crucial Note: If your model included time-varying confounders and you needed to use sequential (time-varying) IPTW, then you would need to use the rephrased, multiple-row dataset and model the probability of staying on the treatment or switching at each time point. However, your goal is only to control for time-fixed confounders, making the single-row baseline PS calculation the correct choice.

 

I wonder if doing so this way will disturb the pseudo-cohort created by the IPTW technique from the raw dataset. It seems a same weight value will be assigned to the time-dependent dataset in which each individual might have more than one rows if they were followed long enough and their time-varying cofactor changed.

 

Thank you!

 

 

5 REPLIES 5
quickbluefish
Barite | Level 11

I can't say that I've heard of someone doing this before (calculating the weights only from the time-fixed variables), but I think the Gemini advice actually makes sense.  I've only used IPTW in a longitudinal dataset with weights that varied over time because in those cases, it was reasonable to think that the probability of treatment could be influenced by the time-varying covariates, but if that's not a concern in your data, I would say your approach seems logical.  And yes (re: your last sentence) - everyone would have the same weight in your case regardless of the timepoint, but I think that's what you want based on your initial description.  All that said, I only "play a statistician on TV", so hopefully others will chime in.  

TomHsiung
Lapis Lazuli | Level 10

Thanks for your feedback, pal.

 

I have never heard of the time-dependent IPTW. I have read thousands of clinical study publications but not yet encountered a case with the time-based IPTW technique. It must be very cool.

quickbluefish
Barite | Level 11

This was a long time ago and I don't remember the details, but it was a marginal structural model, and the weight used at timepoint (n) was the cumulative product of all weights from T(1) to T(n) for that individual.  The weights were a real mess to deal with because of this method as it would very often result in extreme values that needed to either be trimmed or otherwise dealt with.  

Ksharp
Super User

PROC PHREG support RANDOM statement which could take TIME as a random effect.

But you want a time-dependent Cox regression, it looks like a combination of time series analysis and Cox model .

Maybe you should check SAS/ETS doc.

 

Ksharp_0-1760322873894.png

 

TomHsiung
Lapis Lazuli | Level 10
Thanks, pal. Hmmm, looks like we should avoid the IPTW technique for time-dependent COX model.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 488 views
  • 1 like
  • 3 in conversation