BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Haemoglobin17
Obsidian | Level 7

Dear all,

I have been conducting a cohort study and I would like to have the survival probability estimates with confidence intervals in a dataset to export in order to create a  failure curve with R.

 

With the following proc phreg code I output a dataset with the value of survival probability, but can't do the same with confidence intervals. Can you help me please. I didn't find a specific solution on the internet.. Thank you very much!

 

proc phreg data=Dataset;
class group (ref="0");
model followup*event(0)=
/ ties=efron rl;
weight sw/ normalize;
strata group;
output out=Dataset_Surv=s;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Haemoglobin17,

 

I think you can use the BASELINE statement instead of the OUTPUT statement:

baseline out=want survival=s l=lcl u=ucl;

Dataset WANT will contain the survival probability estimates (variable S) together with the confidence limits (LCL, UCL).

View solution in original post

6 REPLIES 6
FreelanceReinh
Jade | Level 19

Hello @Haemoglobin17,

 

I think you can use the BASELINE statement instead of the OUTPUT statement:

baseline out=want survival=s l=lcl u=ucl;

Dataset WANT will contain the survival probability estimates (variable S) together with the confidence limits (LCL, UCL).

Haemoglobin17
Obsidian | Level 7

It is exactly what I was looking for! I didn't know the baseline statement, thank you @FreelanceReinh !!

Haemoglobin17
Obsidian | Level 7

I noticed that when I use the statement baseline the total number of observations is 3900, while with output is 4 millions. What's the reason? The log doesn't give me errors or warnings. I tried to re-run the program with output and it comes back to 4 millions. Is there a way to fix this?

FreelanceReinh
Jade | Level 19

@Haemoglobin17 wrote:

I noticed that when I use the statement baseline the total number of observations is 3900, while with output is 4 millions. What's the reason? The log doesn't give me errors or warnings. I tried to re-run the program with output and it comes back to 4 millions. Is there a way to fix this?


The output dataset from the BASELINE statement is "condensed" in that it contains each survival probability estimate (variable S) only once (per group). If you have tied event times (i.e. duplicate values of variable FOLLOWUP within a group) or censored observations (EVENT=0), where S doesn't change, the dataset created by the OUTPUT statement will contain the corresponding observations, all with the same S value. This redundancy is avoided in the BASELINE output dataset. 

 

If you sort the 4-million-observation output dataset NODUPKEY by GROUP S, the resulting dataset should have very close to 3900 observations, the remaining discrepancies, if any, being "trivial" observations with FOLLOWUP=0 & S=1. No non-trivial values of S should be lost (i.e., be unavailable from the BASELINE statement).

 

You could merge the LCL and UCL values from the BASELINE output dataset to the large output dataset in a DATA step if you need them redundantly "multiplied" as well.

Haemoglobin17
Obsidian | Level 7

If I leave the repeated values of effect estimates, does this changes the survival/failure curve? 

FreelanceReinh
Jade | Level 19

If the S values for two different FOLLOWUP times (in the same group) are equal, then at least one of the observations should be a censored observation. These points correspond to a constant ("flat") part of the estimated survival curve. Duplicate S values for two or more observations with the same FOLLOWUP time correspond to only one point of the survival curve.

 

To draw the survival curve, you don't need the redundant duplicate S values. (You need the censored times for the markers indicating censored observations, though.) Between the points defined by the event times and the corresponding unique S values the curve is just flat.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1312 views
  • 4 likes
  • 2 in conversation