Dear all,
I have been conducting a cohort study and I would like to have the survival probability estimates with confidence intervals in a dataset to export in order to create a failure curve with R.
With the following proc phreg code I output a dataset with the value of survival probability, but can't do the same with confidence intervals. Can you help me please. I didn't find a specific solution on the internet.. Thank you very much!
proc phreg data=Dataset;
class group (ref="0");
model followup*event(0)=
/ ties=efron rl;
weight sw/ normalize;
strata group;
output out=Dataset_Surv=s;
run;
Hello @Haemoglobin17,
I think you can use the BASELINE statement instead of the OUTPUT statement:
baseline out=want survival=s l=lcl u=ucl;
Dataset WANT will contain the survival probability estimates (variable S) together with the confidence limits (LCL, UCL).
Hello @Haemoglobin17,
I think you can use the BASELINE statement instead of the OUTPUT statement:
baseline out=want survival=s l=lcl u=ucl;
Dataset WANT will contain the survival probability estimates (variable S) together with the confidence limits (LCL, UCL).
It is exactly what I was looking for! I didn't know the baseline statement, thank you @FreelanceReinh !!
I noticed that when I use the statement baseline the total number of observations is 3900, while with output is 4 millions. What's the reason? The log doesn't give me errors or warnings. I tried to re-run the program with output and it comes back to 4 millions. Is there a way to fix this?
@Haemoglobin17 wrote:
I noticed that when I use the statement baseline the total number of observations is 3900, while with output is 4 millions. What's the reason? The log doesn't give me errors or warnings. I tried to re-run the program with output and it comes back to 4 millions. Is there a way to fix this?
The output dataset from the BASELINE statement is "condensed" in that it contains each survival probability estimate (variable S) only once (per group). If you have tied event times (i.e. duplicate values of variable FOLLOWUP within a group) or censored observations (EVENT=0), where S doesn't change, the dataset created by the OUTPUT statement will contain the corresponding observations, all with the same S value. This redundancy is avoided in the BASELINE output dataset.
If you sort the 4-million-observation output dataset NODUPKEY by GROUP S, the resulting dataset should have very close to 3900 observations, the remaining discrepancies, if any, being "trivial" observations with FOLLOWUP=0 & S=1. No non-trivial values of S should be lost (i.e., be unavailable from the BASELINE statement).
You could merge the LCL and UCL values from the BASELINE output dataset to the large output dataset in a DATA step if you need them redundantly "multiplied" as well.
If I leave the repeated values of effect estimates, does this changes the survival/failure curve?
If the S values for two different FOLLOWUP times (in the same group) are equal, then at least one of the observations should be a censored observation. These points correspond to a constant ("flat") part of the estimated survival curve. Duplicate S values for two or more observations with the same FOLLOWUP time correspond to only one point of the survival curve.
To draw the survival curve, you don't need the redundant duplicate S values. (You need the censored times for the markers indicating censored observations, though.) Between the points defined by the event times and the corresponding unique S values the curve is just flat.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.