Statistical Procedures

dandar · Posted 02-05-2025 11:52 AM

Dear sasusers

I am having difficulty understanding the Brookmeyer-Crowley (henceforth BC) method of CI construction for the median survival time estimated from a KM-estimated survival function - as implemented by proc lifetest. As per the link here https://documentation.sas.com/doc/en/statug/latest/statug_lifetest_details03.htm , the method outlined is as per the screenshot below.

Note the formula in the image above is for obtaining the BC CI for the 25th percentile of S(t) - thus for general percentile p the g(1-0.25) would be g(1-p) and in particular g(0.5) for the median. This has been discussed before on this forum here https://communities.sas.com/t5/Statistical-Procedures/Brookmeyer-and-Crowley-Confidence-Intervals/td... and contains a nice discussion and link to data and code purporting to implement the BC CI "manually" (this involves converting to Z scores using the loglog transofrom and implementing the test-based procedure).

The link is https://stats.oarc.ucla.edu/sas/examples/asa2/applied-survival-analysis-by-hosmer-lemeshow-and-maych... and I will discuss the code contained therein a little later in my question.

From the SAS forum link above, it seems the macro is comparing the Lifetest BC CIs with the manual CIs implemented. Now the original code allowed Lifetest to default to g=loglog, but I modified to permit the user to specify any type of "g-transformation" of S(t) that Lifetest allows. The reason is (as described shortly), that g=linear is supposed to yield the original BC CIs as per their paper in 1982, and g<>linear produces something similar "in spirit" but not in actuality. You need to download the data from the stats.oarc website in the links, and then run this code using the attached updated macro to get Lifetest to generate the estimates and CIs for the times where S(t)=0.25,0.5,0.75 (here using g=loglog);

libname in "your path to data";
data t51; set in.whas500;
lf = lenfol / 365.25;
cnt=1;
run;
proc sort data=t51;
by gender;
run;

proc lifetest data=t51  stderr outsurv=_s_ alpha=0.05 alphaqt=0.05 conftype=loglog;
ODS OUTPUT PRODUCTLIMITESTIMATES = km survivalplot=sp  Quartiles=quart;
by gender;
time lf * fstat(0);
run;

proc print data=quart;
run;

Giving the following output;

To produce the manual BC CIs run the following code;

%bcconf(t51, lf, fstat, byvar=gender, level=95,q=50, conf_type=loglog);
%bcconf(t51, lf, fstat, byvar=gender, level=95,q=50, conf_type=linear);

Producing the following output;

So for lifetest using g=loglog (which matches the manual code using loglog) the results are different - but only slightly. Not surprisingly when lifetest uses loglog g-transform the agreement is better.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

However when I simulate data I get very different behaviour (an Oncology example with N=300 people and two treatment groups across three different levels of data maturity corresponding to data cut-off DCO dates for interim analyses) essentially the BC CIs coming from Lifetest often do not have the upper endpoint for the CI - i.e. like a 1-sided CI. In contrast the manual BC CIs work well, most of the time producing complete CIs.

The data simulated had the following properties: control median is 7 months (approx 210 days) versus an active median of 11.5 months (approx 345 days - corresponding to a HR of 0.61). Event times are generated form the standard constant hazards process - i.e. exponentially distributed times. Censoring is given by exponential distributed times giving 5% incidence at 12 months, although censoring competes with events in the simulation.

To see the problem I chose one of many "low maturity" datasets giving a 1-sided CI for the Lifetest CIs. This particular dataset has the following features;

Active arm: N = 117 subjects of whom 32% had an event, 1.7% were censored, observed median follow-up time of 129 days (4.24 months), largest observed event time of 418 days
Control arm: N = 113 subjects of whom 54% had an event, 1.76% were censored, observed median follow-up time of 161 days (5.29 months), largest observed event time of 474 days
Thus total of 38 (active) + 62 (control) = 100 events out of 117 (active) + 113 (control) = 230 people of whom 77 (active) + 49 (control) = 126 were admin censored by the DCO (the DCO was event driven aiming for 100 events)

Using the linear CIs in lifetest the following output is produced. Firstly the KM plot - I have circled the day 338 event time for active corresponding to the estimated median (one event time at 418 days follows this). Also I have circled where the S(t)=0.5 horizontal line intersects the lower 95% CI (which is intuitively how the lower bound for the median survival time can be found). Note this cannot be done for the upper bound - it does not drop below S(t)=0.5 and hence (possibly) should be missing. For control group this is not a problem since the 95% confidence band drops below S(t)=0.5.

Then the active group median and mean are reported

So no upper bound for the median survival time (or the 75th percentile for that matter) - as expected from the previous intuition. Then the control group median and mean

This time the median CIs are fine - only the 75th percentile is not OK, again as expected.

Running the manual BC CI calculations in the macro (using linear conftype) the results of 10 such datasets for the active arm comparing the Lifetest BC CIs with the manual BC CIs are given below (the dataset detailed above is s=2);

Below shows the Km curve for the dataset number 2 (I have unashamedly drawn the manual BC CIs on with a painting tool - sorry ran out of tie to do that properly!). What is clear is whilst the upper 95% CI on S(t) does not intersect S(t)=0.5, the manual BC CI method has chosen to declare the upper 95% bound is the last observed event at t=418 days. Whilst this wide manual BC CI looks plausible given the relatively flat survival curve around the median of t=338, presumably this is not a true CI upper bound but rather a logical/sensible value to truncate the interval at?

So after all this my question is simple: should the upper CI for the median time be set at the last observed event time when the upper bound of the estimated survival function does not intersect 0.5?

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Some thoughts after trying to digest (quickly) the literature. The image below is a screen shot from the original paper,

and I think I can see by putting g(x)=x (i.e. the identity function) in the SAS formula, and the fact a sqrt of a Chi-square random variable is a standard normal, that a test-based CI using the SAS formula and identity function for g does indeed try and implement the test-based method of Brookmeyer-Crowley as per their original paper. However from the original paper it seems there is more detail required - for instance what happens if the confidence set obtained (R_alpha in the above screenshot) is not an interval (perhaps it is a union of intervals). Do we present that, since the "I" in CI does imply an interval and not a region. The next page in the original paper is as follows;

So it seems from their simulations that R_alpha is "almost always" an interval. The highlighted box then suggests they define an interval I_alpha=[t_i,t_j) based on R_alpha (i.e to ensure an interval);

t_i: smallest observed event time in R_alpha with S(t_i)>0.5
t_j: smallest observed event time not in R_alpha with S(t_j)<0.5

In the dataset above, the confidence set is identified in the macro using these lines;

    data  _sq_ ;
      set _s_  (keep = &time survival sdf_stderr) ;
      z = log(-log(survival));
	  *get the variance for log-log survival;
	  v = sdf_stderr**2/(survival**2*(log(survival))**2);
      z&q = abs(z - log(-log((100-&q)/100)))/sqrt(v);
    run;

In the above _s_ (survival function estimates) comes from the lifetest procedure. The result of this step is as follows - before selecting the times in R_alpha;

The red box corresponds to |Z50|<1.96 and is how R_alpha is identified. We see t=238 is indeed the smallest time with |Z|<1.96 and with S(t)>0.5. Thus t_i = 238. However we also see t=418 is in R_alpha since it's Z satisfies |Z|<1.96. Thus t_j does not exist since no further event times follow. The text on page 32 suggests the interval should be 1-sided, with t_j=infty to agree with the CI in Lifetest. However the macro code sets t_j=418 and only does this check;

data _sqall2_;	 
    set _sqall_;
	if bc_upper <=estimate then bc_upper = .;
	if bc_lower >=estimate then bc_lower = .;
	percentile = &q;
  run;

Since t_j=418 > 338 (the median estimate) then t_j is retained and the final CI is (238,418).

In summary;

Lifetest CIs seem to adhere to the original BC paper by not setting the upper bound to be the largest observed time when t_j does not exist
Macro code in contrast sets t_j to be largest event time in R_alpha as long as this is larger than the median estimate
The CI from the macro code is preferable since it is a tighter interval than from Lifetest (not hard given the upper bound is infinity) but, this is not a proper upper CI bound but rather a truncated interval - i.e. there is nothing to say that the true median does not lie above this value?

SteveDenham · Posted 02-05-2025 01:35 PM

A long post, and I fear I may have missed the question, but I will point out that one of your last statements " i.e. there is nothing to say that the true median does not lie above this value?" is critical. It is why the upper bound from LIFETEST is not defined, and seems to me to be a really good reason to use the LIFETEST CI's rather than the macro generated CI's.

But that is just me.

SteveDenham

View solution in original post

SteveDenham · Posted 02-05-2025 01:35 PM

A long post, and I fear I may have missed the question, but I will point out that one of your last statements " i.e. there is nothing to say that the true median does not lie above this value?" is critical. It is why the upper bound from LIFETEST is not defined, and seems to me to be a really good reason to use the LIFETEST CI's rather than the macro generated CI's.

But that is just me.

SteveDenham

dandar · Posted 02-05-2025 02:10 PM

Hi Steve

Many thanks for the reply. Yes sorry for the length - just in case the answer required knowledge of the data configuration. But no you did not miss the main question about the intuitive appeal or not of truncating the upper bound of the CIs rather than setting to infinity. I suppose I thought since somebody had gone to the effort of truncating (in the macro), and they are clearly knowledgeable about stats, that there was a rationale that had merit. My data in the early interims will often give rise to these 1-sided CIs, and hence an alternative 2-sided CI would be of interest, but not if this is a risky (inferentially) strategy.

Statistical Procedures

Brookmeyer-Crowley CIs for median survival time using Lifetest: infinite upper bounds - a problem?

Re: Brookmeyer-Crowley CIs for median survival time using Lifetest: infinite upper bounds - a proble

Re: Brookmeyer-Crowley CIs for median survival time using Lifetest: infinite upper bounds - a proble

Re: Brookmeyer-Crowley CIs for median survival time using Lifetest: infinite upper bounds - a proble

Follow Us

What is...

Statistical Procedures

Brookmeyer-Crowley CIs for median survival time using Lifetest: infinite upper bounds - a problem?

Re: Brookmeyer-Crowley CIs for median survival time using Lifetest: infinite upper bounds - a proble

Re: Brookmeyer-Crowley CIs for median survival time using Lifetest: infinite upper bounds - a proble

Re: Brookmeyer-Crowley CIs for median survival time using Lifetest: infinite upper bounds - a proble

Our biggest data and AI event of the year.

Follow Us

What is...