BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dandar
Obsidian | Level 7

I am trying to perform a 1-sided group sequential design using proc seqdesign. I am not too familiar with group sequential designs but believe the issues I am experiencing are not knowing how to "drive the software" rather than GSD-related (I may be wrong however!). In advance I appreciate your time looking at this.

 

The settings are an assumed hazard ratio of 0.3 (<1 favours active treatment) against a threshold of 1 (i.e. testing for superiority at the smallest possible HR of 1). I want 90% power and 2.5% 1-sided type 1 errors. The control arm response rate is 1% events observed in 1 year, and for subject numbers I have chosen a followup time of 1 year and an accrual period of 1 year. The subject numbers is not the primary question, but rather the function "twosamplesurvival" in proc seqdesign uses these inputs. So take these as examples only.

 

Using the following proc power code I get event numbers required of 28 (I "manually" set "numbersubintervals=2" to get 29 to agree with R output of 29 events - increasing this increases event numbers, but this discrepancy is not my primary question - however any thoughts would be welcome). Later I am interested in testing against HR<1 but proc power only seems to test against HR=1 (again any thoughts welcome).

 

%let Lambda_cnt=0.01005; %let AT = 1; %let FUT=1; %let HR=0.3; %let alpha=0.025; %let beta=0.1; 

proc power; twosamplesurvival test=logrank refsurvexphazard = &Lambda_cnt. hazardratio = &HR. accrualtime = %sysevalf(&AT.) followuptime = %sysevalf(&FUT.) groupweights = (1 1) eventstotal=. alpha=&alpha power = %sysevalf(1-&beta.) sides=U nsubinterval=2; run;

The following two code snippets are exactly the same except the first uses "alt=lower" since I want to do a lower 1-sided test. however I get a warning about the sample size procedure not being used. The second snippet runs fine but with an upper boundary with an assumed effect size switched to being positive (the drift parameter in the first is negative as expected, but the second is simply switched to being positive). 

 

So my first question is why can I not force SAS to give me a lower test? It appears the results are fine (just "flipped" to be an upper test), but I find this annoying and also unsettling (i.e I am not at all sure I am using the right code). I have been looking at the help documents for a long time and if I have missed a simple option I would be amazed (although I have been known to miss these things!).

 

%let HR_thresh = 1;

title "alt=lower: Sample size statement not working";
proc seqdesign altref=%sysfunc(log(&HR.))  stopprob errspend plots=boundary plots=power plots=ASN plots=errspend;
ods output Boundary=BI SampleSizeSummary=SSS_D stopprob=stoprob samplesizesummary=sss errspend=errspend;
           
   OBrienFleming: design nstages=2 method=obf alpha=&alpha. beta=&beta. alt=lower stop=reject  ;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.                       
                      hazardratio=%sysevalf(&HR.) 
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;


title "alt=upper: Sample size statement  working";
proc seqdesign altref=%sysfunc(log(&HR.))  stopprob errspend plots=boundary plots=power plots=ASN plots=errspend;
ods output Boundary=BI SampleSizeSummary=SSS_D stopprob=stoprob samplesizesummary=sss errspend=errspend;
           
   OBrienFleming: design nstages=2 method=obf alpha=0.025 beta=&beta. alt=upper stop=reject  ;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.                       
                      hazardratio=%sysevalf(&HR.) 
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

 

The following is an image of the error for the first code snippet;

 

dandar_1-1691428680598.png

 

My second question relates to the threshold. If I set HR_threshold to 0.9 my R sample sizing tells me I need 35 events now rather than 29 due to the more stringent test. But SAS output event numbers stay at 29 (even though a new line appears accounting for this change - circled in red). So what am I doing wrong here?

 

 

 

dandar_2-1691429321697.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
clayt85
SAS Employee

Interesting. I see the problem. Here are my results when I run your program exactly as written.

SampleSizeSummary1.pngSampleSizeSummary2.png

As you can see, the results differ. (Note also that the results immediately above are slightly different from the results in my previous response. That is because the code in this message uses NSTAGES=1, while in my previous response I used NSTAGES=2.)

 

I see your statement that, when you run this latest code snippet, you obtain the same results with both invocations of SEQDESIGN. This is a known bug. First, I'll explain why the results *should* differ. Then I'll explain how you can work around the bug with SAS 9.4M7.

 

When the two groups have different NULLHAZARD values, the alternative reference is no longer simply the negative log of the hazard ratio. Rather, you must subtract out the negative log of the hazard ratio at baseline. (See the definition of theta_1 in the MODEL=TWOSAMPLESURVIVAL Section here.) Thus, the first invocation (where you specify the ALTREF= value) and the second invocation (the alternative reference gets computed internally by SEQDESIGN) will not match because the alternative reference values will be different. It is the latter invocation of SEQDESIGN that seems to match your stated objectives.

 

The bug occurs when you use the NULLHAZARD= option to specify two (different) baseline hazards and then also specify the HAZARDRATIO= option. In this case, SEQDESIGN does not compute the correct value of the hazard for Group A. My comments in the previous paragraph give you a hint for the workaround: you use the (correct) formula to compute the ALTREF= value:

proc seqdesign altref=%sysevalf(%sysfunc(log(&HR.))-%sysfunc(log(&HR_thresh.))) plots=(none)  ;
          
   fixed: design nstages=1 alpha=0.025 beta=&beta. alt=upper   ;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.                        
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

This will give you the results you desire.

View solution in original post

6 REPLIES 6
clayt85
SAS Employee

Greetings, and thank you for your question.

 

I apologize in advance for the long response, but I hope you will find it useful. I've tried to include links to relevant sections of the documentation in case you want additional information.

 

First, on PROC POWER. Setting NSUBINTERVAL=2 is much too low. The fact that doing so produces a result of 28 events (close to the R output of 29 events) is pure coincidence. In fact, even the default value of NSUBINTERVAL=12 seems a bit too low in this case. Setting that value much higher (on the order of 1000) will produce a result that is accurate to well within 1%.

 

As to why the PROC POWER result (ceiling of 37 events) differs from your R analysis (29 events), my guess is that there are some options/methods/etc that differ somewhere. These power and sample size computations (particularly for survival endpoints), even when nominally similar in their options specifications, can use different underlying statistical models and computational approximations. This leads to different results. That said, here is a code snippet that uses PROC SEQDESIGN to produce a fixed-sample design and agrees with your other result (29 events):

%let Lambda_cnt=0.01005; %let AT = 1; %let FUT=1; %let HR=0.3; %let alpha=0.025; %let beta=0.1;
%let HR_thresh = 1;

proc seqdesign altref=%sysfunc(log(&HR.));
   FixedSample: design nstages=1 alpha=&alpha. beta=&beta. alt=upper;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.                       
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

The computations performed by PROCs POWER and SEQDESIGN are summarized in the SAS/STAT documentation (PROC POWER > Details > Computational Methods and Formulas > Analyses in the TWOSAMPLESURVIVAL Statem...; PROC SEQDESIGN > Details > Applicable Two-Sample Tests and Sample Size Computation).

 

Notice that, in the code above, I specified ALT=UPPER. I also removed the HAZARDRATIO suboption in the SAMPLESIZE statement. To understand why, compare the outputs from the following two analyses:

proc seqdesign altref=1.203973;
   FixedSample: design nstages=1 alpha=&alpha. beta=&beta. alt=upper;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

proc seqdesign altref=1.203973;
   FixedSample: design nstages=1 alpha=&alpha. beta=&beta. alt=lower;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

Here I have replaced the %sysfunc macro with the hard-coded numerical value just for transparency. The only difference between these two invocations is ALT=UPPER vs ALT=LOWER on the DESIGN statement.

 

Compare the Design Information tables. You can see that the value of the alternative reference has changed signs, even though I specified the same alternative reference both times! This is because of the way SEQDESIGN handles the ALTREF= option--it uses the absolute value of the user-specified value; then the correct sign (+/-) is applied based upon the ALT=option.

 

Now compare the Sample Size Summary tables. The hazard rates for Group A are different (they are reciprocals of each other). These hazard rates are calculated from the alternative reference (see the documentation: SEQDESIGN > Syntax > SAMPLESIZE Statement > Two-Sample Models > MODEL=TWOSAMPLESURVIVAL Subsection). If you use ALTREF=1.203973 you get one value, if you use ALTREF=-1.203973 you get the reciprocal value. You can see that your intended analysis (Hazard Ratio for Group A is 0.3) occurs when ALT=UPPER.

 

So why is this an ALT=UPPER test when we want to compare an assumed effect of 0.3 against a null hypothesis of 1.0? Because the power analysis is not based on the hazard rates but rather on the negative log of the hazard ratio. For you, this means -log(0.3/1.0) = 1.203973. This is explained in greater details in the previously linked sections of the SEQDESIGN documentation.

 

The next question is: why did I need to remove the HAZARDRATIO suboption in the SAMPLESIZE statement? The truth is: I did not *have* to remove it. The following code will work just fine:

proc seqdesign;
   FixedSample: design nstages=1 alpha=&alpha. beta=&beta. alt=upper;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.
                      hazardratio=%sysevalf(&HR.) 
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

Notice that this time I removed the ALTREF= option. That is because the ALTREF option is extraneous--the value of the alternative reference can be computed from the information provided on the SAMPLESIZE statement. And this is the key to understand the warning message that you received (in which the SAMPLESIZE statement was ignored because of the ALT=LOWER option). In your code, SEQDESIGN realizes that it has enough information (on the SAMPLESIZE statement) to compute the alternative reference, so it does. The result of that computation is positive (1.203973). This means setting ALT=LOWER does not make any sense; SEQDESIGN prints a warning and then ignores the SAMPLESIZE statement.

 

(When printing the warning message, SEQDESIGN *prints* the value of the ALTREF= option if one is provided. That is why, in your screenshot, you see "WARNING: The alternative reference -1.204 is derived...". In fact, the value that is derived by SEQDESIGN is 1.204 (positive) while the value of -1.204 (negative) was provided in the ALTREF= option. This is a bug in the code that prints the warning message.)

 

I hope this explains why ALT=UPPER is the correct hypothesis test for your use case.

 

On the final issue, when you set HR_thresh = 0.9. What version of SAS are you running? In my hands, I obtain the 35 events that you are expecting:Screenshot 2023-08-08 161334.png

dandar
Obsidian | Level 7

@clayt85  please do not apologise for such a long answer - this is exactly the detail I needed. Thank you so much for taking the time to explain things I greatly appreciate this.

 

The issue of number of subintervals in proc power needing to be high - this makes sense if an approximation is occurring via a piecewise hazard function or something like this (i.e. more pieces is more resolution/granularity) and I suppose I had not yet accepted that, as you correctly point out), sometimes methods just do differ. However 5-6 events can be a big deal in very slow recruiting settings  and/or low event rate settings. I suppose "being conservative" and using the largest event number is an option. However your 1-stage (i.e. fixed) sample size design from proc seqdesign does get close so the discrepancy (versus Rpact software in R, and also my manual calculations) is using proc power.

 

I see now I did miss the null hypothesis being tested being -log(HR) and not log(HR), so I suppose I was being lazy and had concluded SAS was "flipping things". However the "sample size not being used" warning from proc seqdesign originating through the duplication of information pertaining to the alternate effect (i.e. using "altref" in the proc design line versus "hazard ratio" in the samplesize line) would probably have baffled me for some time further, so you have saved me a lot of time! Incidentally the same reasons lie behind me using an upper test in proc power - i.e. i realised SAS had flipped the test, and presumably the documentation states it uses -log(HR).

 

On the final subject of the null hazard ratio (my "HR_thresh" macro variable). It is reassuring to know you can obtain the 35 events when using HR_thresh=0.9. Take a look at my output below (I use two versions - one with and without the hazard ratio statement) and they both report a hazard ratio of 0.27 even though both code snippets use HR=0.3. The log(HR/HR_thresh) = log(HR)-log(HR_thresh) should equal -1.0986 but is being reported as -1.20397. The log(HR_thresh) is correctly reported as -0.10536. The log(HR) is incorrectly reported as -1.30933 when it should be -1.2039.

 

Maybe my code is wrong? Can you please post your code - i.e. you may have corrected mine? I am using SAS Studio release 3.81 (Enterprise Edition) SAS release 9.04.01M7P08062020 running on a Linux LIN X64 platform. 

 

%let Lambda_cnt=0.01005; 
%let AT = 1; 
%let FUT=1; 
%let HR=0.3; 
%let HR=0.3; 
%let alpha=0.025; 
%let beta=0.1; 
%let HR_thresh = 0.9;


title "Gives HR=0.27: altref specified";
proc seqdesign altref=%sysfunc(log(&HR.)) plots=(none)  ;
          
   fixed: design nstages=1 alpha=0.025 beta=&beta. alt=upper   ;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.                        
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

title "Gives HR=0.27: hazard ratio specified";
proc seqdesign  plots=(none)  ;
          
   fixed: design nstages=1 alpha=0.025 beta=&beta. alt=upper   ;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.  
                      hazardratio=&HR.       
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

The output (same for both);

 

dandar_0-1691568553023.png

 

clayt85
SAS Employee

Interesting. I see the problem. Here are my results when I run your program exactly as written.

SampleSizeSummary1.pngSampleSizeSummary2.png

As you can see, the results differ. (Note also that the results immediately above are slightly different from the results in my previous response. That is because the code in this message uses NSTAGES=1, while in my previous response I used NSTAGES=2.)

 

I see your statement that, when you run this latest code snippet, you obtain the same results with both invocations of SEQDESIGN. This is a known bug. First, I'll explain why the results *should* differ. Then I'll explain how you can work around the bug with SAS 9.4M7.

 

When the two groups have different NULLHAZARD values, the alternative reference is no longer simply the negative log of the hazard ratio. Rather, you must subtract out the negative log of the hazard ratio at baseline. (See the definition of theta_1 in the MODEL=TWOSAMPLESURVIVAL Section here.) Thus, the first invocation (where you specify the ALTREF= value) and the second invocation (the alternative reference gets computed internally by SEQDESIGN) will not match because the alternative reference values will be different. It is the latter invocation of SEQDESIGN that seems to match your stated objectives.

 

The bug occurs when you use the NULLHAZARD= option to specify two (different) baseline hazards and then also specify the HAZARDRATIO= option. In this case, SEQDESIGN does not compute the correct value of the hazard for Group A. My comments in the previous paragraph give you a hint for the workaround: you use the (correct) formula to compute the ALTREF= value:

proc seqdesign altref=%sysevalf(%sysfunc(log(&HR.))-%sysfunc(log(&HR_thresh.))) plots=(none)  ;
          
   fixed: design nstages=1 alpha=0.025 beta=&beta. alt=upper   ;
   samplesize model=twosamplesurvival 
                      (nullhazard=%sysevalf(&Lambda_cnt.*&HR_thresh.) &Lambda_cnt.                        
                      acctime=&AT. accrual=UNIFORM  foltime=&FUT.);
run;

This will give you the results you desire.

dandar
Obsidian | Level 7

@clayt85  Amazing thank you again! I have now familiarised myself with the SAS notation (and this bug).  Do you know which is the earliest SAS version that corrects for this?

clayt85
SAS Employee

This bug has been fixed in all versions of the SAS Viya 4 platform. It still exists in all versions of the SAS 9 platform (9.4M8, 9.4M7, ...)

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 868 views
  • 6 likes
  • 2 in conversation