I am simulating survival data to mimic an already existing dataset. The details of the original dataset are as follows -
1. 2 treatment groups
2. Patient follow up until at least 5 years.
3. Event rate at 5 years is around 50% for both groups at the end of 5 years.
4. An additional 10% (trt 1) and 25% (trt 2) patients dropped out of the study before the full 5Y follow-up.
I am using the Weibull Shape (1.0147) and Weibull Scale (6.4465) parameters from the SAS output of proc lifereg procedure run separately by group to simulate the data. The survival time (in years) is capped at 5y before running the procedure.
The proc lifereg code is as follows:
proc lifereg data=surv; where group=1; model surv_time_years*event(0) = / dist=Weibull; run;
I am using the following line of code: Time=rand('Weibull', 1.0147, 6.4465) to simulate data (with same number of patients) for trt 1.
Although the overall mean time of simulated (3.46) vs. original data (3.45) for trt 1 is almost the same, issue is that the simulation is overestimating the number of patients completing 5 years. Around 10% more patients in the simulated dataset have time >= 5y, with some really extreme values not seen in the original dataset. As a consequence, the number of patients completing 1,2,3,4,5 years in both simulated vs. original data is completely different.
I followed other posts on the forum and tried removing the censoring variable 'event(0)' from the model statement when estimating Weibull parameters as suggested in one post, but with no luck. Also, I am in possession of Rick Wicklin's 'Simulating Data with SAS' text and have gone through the relevant sections on simulating survival data.
I tried using other distributions (comparing model fit using AIC/BIC from proc lifereg) to simulate the data resulting in similar or even worse results.
I understand this is a simulation and some amount of variation is expected, but, I feel I am missing something here.
Any help is appreciated.
... View more