I am simulating survival data to mimic an already existing dataset. The details of the original dataset are as follows -
1. 2 treatment groups
2. Patient follow up until at least 5 years.
3. Event rate at 5 years is around 50% for both groups at the end of 5 years.
4. An additional 10% (trt 1) and 25% (trt 2) patients dropped out of the study before the full 5Y follow-up.
I am using the Weibull Shape (1.0147) and Weibull Scale (6.4465) parameters from the SAS output of proc lifereg procedure run separately by group to simulate the data. The survival time (in years) is capped at 5y before running the procedure.
The proc lifereg code is as follows:
proc lifereg data=surv;
where group=1;
model surv_time_years*event(0) = / dist=Weibull;
run;
I am using the following line of code: Time=rand('Weibull', 1.0147, 6.4465) to simulate data (with same number of patients) for trt 1.
Although the overall mean time of simulated (3.46) vs. original data (3.45) for trt 1 is almost the same, issue is that the simulation is overestimating the number of patients completing 5 years. Around 10% more patients in the simulated dataset have time >= 5y, with some really extreme values not seen in the original dataset. As a consequence, the number of patients completing 1,2,3,4,5 years in both simulated vs. original data is completely different.
I followed other posts on the forum and tried removing the censoring variable 'event(0)' from the model statement when estimating Weibull parameters as suggested in one post, but with no luck. Also, I am in possession of Rick Wicklin's 'Simulating Data with SAS' text and have gone through the relevant sections on simulating survival data.
I tried using other distributions (comparing model fit using AIC/BIC from proc lifereg) to simulate the data resulting in similar or even worse results.
I understand this is a simulation and some amount of variation is expected, but, I feel I am missing something here.
Any help is appreciated.
Hello @kc,
How did you implement item 4 -- the drop-outs -- in your simulation? In the PharmaSUG 2004 paper Statistical Simulations for Sample Size Calculation with PROC IML the author considers the distributions of three times:
Hello,
Maybe @Rick_SAS can help?
It's Belgian National holiday over here , so I may not rack my brain 😁 .
Cheers,
Koen
Hello @kc,
How did you implement item 4 -- the drop-outs -- in your simulation? In the PharmaSUG 2004 paper Statistical Simulations for Sample Size Calculation with PROC IML the author considers the distributions of three times:
Proc LIFEREG uses a different parameterization for the Weibull distribution, as compared with the RAND function and PROC UNIVARIATE. You can read about the difference and how to convert one set of parameters into the others: Interpret estimates for a Weibull regression model in SAS
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.