Hello, guys. I have encountered an issue that I had never faced before. As my statistical knowledge grows, the data analysis task also gets more complicated. Here is my question: For a time-dependent Cox regression, we would like to use the IPTW for the control of time-fixed confounders, wile leaving the intervention of interest and the time-varying confounders only in the time-dependent Cox model. However, what we are uncertain is, when should we use the logistic model to get the weight values for each row? The most raw data where each individual is a row, or, the rephrased dataset where multiple rows are from a same individual but during different follow-up intervals. Thanks
The Gemini feedback this:
You should use the logistic model to calculate the IPT weights on the most raw data where each individual is a single row, provided your time-fixed confounders are truly constant over the entire follow-up period.
The reason for this lies in the goal of IPTW for time-fixed confounding: to balance the initial probability of treatment assignment based on baseline (time-fixed) characteristics.
Here is the breakdown of why and how to handle the data structure:
Why Use the Raw, Single-Row Dataset
The Inverse Probability of Treatment Weighting (IPTW) method, when controlling only for time-fixed confounders, requires modeling the probability of receiving a particular intervention based on the individual's baseline characteristics.
Time-Fixed Treatment: Your intervention of interest is also time-fixed (though its effect is time-dependent). The weights need to reflect the initial treatment decision.
Time-Fixed Confounders: Since your confounders are time-fixed (e.g., age at baseline, sex, disease stage at entry), the propensity score (PS) for an individual remains constant throughout their follow-up.
Simplicity and Consistency: Calculating the PS once per individual on the raw dataset ensures that:
The PS calculation reflects the baseline characteristics that drove the initial treatment assignment.
Every interval (every row) of that individual in the rephrased dataset will receive the exact same weight, preserving the marginal treatment effect estimate.
Correct Procedure (Using Raw Data)
Data Structure: Use the dataset where each row is a unique individual, containing the binary treatment status and all baseline (time-fixed) confounders.
Model: Fit a standard logistic regression model for the probability of receiving the intervention.
logit[P(Treatment=1∣Z)]=α+Zδ
(Z are your time-fixed confounders).
Calculate Weights (Wi😞 Calculate the ATE or ATT weights for each individual.
Merge: Merge the calculated weight (Wi) onto the rephrased dataset using the individual ID. Each row/interval for the same person receives the identical weight.
Model: Run the time-dependent Cox model using the rephrased data, including the intervention of interest, time-varying confounders, and the calculated weights in the WEIGHT statement.
Why You Should AVOID the Rephrased Dataset for Weights
If you use the rephrased (interval-based) dataset for the logistic regression:
Redundancy: You would be running the same regression (treatment on time-fixed covariates) thousands of times, but the only variation would be the time interval and censoring status, which are not used in the propensity score calculation.
Incorrect Interpretation: A logistic regression on interval data is generally used for time-varying propensity scores, where you model the probability of treatment at time t conditional on time-varying confounders. Since your confounders are time-fixed, this introduces unnecessary complexity and potential confusion without adding statistical value.
Crucial Note: If your model included time-varying confounders and you needed to use sequential (time-varying) IPTW, then you would need to use the rephrased, multiple-row dataset and model the probability of staying on the treatment or switching at each time point. However, your goal is only to control for time-fixed confounders, making the single-row baseline PS calculation the correct choice.
I wonder if doing so this way will disturb the pseudo-cohort created by the IPTW technique from the raw dataset. It seems a same weight value will be assigned to the time-dependent dataset in which each individual might have more than one rows if they were followed long enough and their time-varying cofactor changed.
Thank you!
... View more