BookmarkSubscribeRSS Feed
JulieB
Fluorite | Level 6

I am trying to use proc seqdesign to calculate a sample size for a one sample binomial proportion.  

 

My problem is that I am having trouble understanding why specifying ref=nullprop gives such a different result than when I do not specify this option.  Can you help me understand what it means when the reference proportion is the alt ref?

 

Using the following code it recommends N=21 for this scenario:

ods graphics on;
proc seqdesign 
boundaryscale=stdz;
   OneSidedOBrienFleming: design method=obf
                       nstages=2
                       alt=upper
		       alpha=0.05
                       beta=0.10
		       stop=both
		       (betaboundary=nonbinding)
                       ;
   samplesize model=onesamplefreq( nullprop=0.10 prop=0.30 ref=nullprop);
ods output Boundary=Bnd_Prop;
run;
ods graphics off;

 

However if I use this code, which I thought based on the help documentation should be equivalent, it recommends N=48.

ods graphics on;
proc seqdesign altref=0.20
               boundaryscale=stdz;
   OneSidedOBrienFleming: design method=obf
                       nstages=2
                       alt=upper
		       alpha=0.05
                       beta=0.10
		       stop=both
		      (betaboundary=nonbinding)
                       ;
samplesize model=onesamplefreq( nullprop=0.10);
ods output Boundary=Bnd_Prop;
run;
ods graphics off;

To me it seems like both situations should be testing the null proportion of 0.10 against the alternative of 0.30.  Can you help me understand the difference between the two?

 

Thanks,

JB

1 REPLY 1
FreelanceReinh
Jade | Level 19

Hi @JulieB,

 

I'm not really familiar with PROC SEQDESIGN, but I have some acquaintance with the subject from an earlier job.

 

A short answer to your question can be found in subsection "Test for a Binomial Proportion" of section "Applicable One-Sample Tests and Sample Size Computation" of the PROC SEQDESIGN documentation. There you can see that the procedure computes the total sample size as an expression proportional to p(1−p). However, as the true value of parameter p is unknown, it must be substituted in the sample size formula by a reasonable value. Via procedure options, either of the two "obvious candidates" can be chosen: p0=0.1 from the null hypothesis or p1=0.3 from the specified alternative. (The alternative hypothesis of the test is p>0.1.) That's why the sample size using the alternative reference is about 2.3 times larger than the other (0.3*(10.3)/(0.1*(1−0.1))=2.333...).

 

The documentation (see link above) argues that "in most cases, the proportion under the alternative hypothesis is used to derive the required sample size" because typically the null hypothesis will be rejected in the end. So, having used p0 for the sample size calculation would seem inconsistent in retrospect. Therefore, REF=PROP had been made the default, not REF=NULLPROP.

 

The formulas in the above linked documentation use normal approximations, assuming a "large sample" and a p0 "not close to 0 or 1". (And it seems that this is the reason why the variance term containing p(1−p) comes into play.) I'm not convinced that your envisioned sample sizes and p0=0.1 clearly meet these assumptions. At least they don't satisfy the "np>=5, n(1−p)>=5" rule of thumb mentioned in standard textbooks and Wikipedia. But there seems to be a good alternative method: In their book "Group Sequential Methods with Applications to Clinical Trials" (which is frequently referred to in the PROC SEQDESIGN documentation), C. Jennison and B.W. Turnbull discuss the above-mentioned difficulties and suggest using exact methods (chapter 12, p. 235).

 

One of the two-stage procedures they present in table 12.1 (p. 238), which originates from a 1982 article by T.R. Fleming, happens to match your requirements (one-sample binomial test of p=p0 vs. p>p0 with p0=0.1, p1=0.3, approximate type I error rate 0.05 and approx. power 0.9). If you don't have this book at hand, you can see the relevant page using the "Look inside" feature of a well-known online book seller :-). There you can find the relevant (inclusive) boundaries for acceptance (ai) and rejection (bi) and the sample sizes (mi), i=1, 2. As it turns out, the total sample size m1+m2 for this procedure is about halfway between the two sample sizes you've calculated.

 

Moreover, it's easy to verify the actual type I and type II error rates stated in the book:

data _null_;
alpha=1-cdf('binom',5,0.1,20)+pdf('binom',3,0.1,20)*(1-cdf('binom',3,0.1,15))
                             +pdf('binom',4,0.1,20)*(1-cdf('binom',2,0.1,15))
                             +pdf('binom',5,0.1,20)*(1-cdf('binom',1,0.1,15));
beta=cdf('binom',2,0.3,20)+pdf('binom',3,0.3,20)*cdf('binom',3,0.3,15)
                          +pdf('binom',4,0.3,20)*cdf('binom',2,0.3,15)
                          +pdf('binom',5,0.3,20)*cdf('binom',1,0.3,15);
put (_all_)(=/);
run;

I hope this helps.

 

 

 

 

 

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1632 views
  • 0 likes
  • 2 in conversation