Hi Aj,
Sorry for not responding earlier (was busy with other work). I have a draft version of a function f1 which I thought would be an implementation of G. T. Fosgate's algorithm. However, when I compared his Table I with my results, I got a lot of discrepancies for the "Exact" columns (whereas the "Approx" columns matched perfectly, which was of course not that surprising).
Then I created a slightly modified version f2 of my function (introduced a "tolerance" parameter), which takes the "sawtooth" effect into account (cf. @ballardw's post). As a result, the number of discrepancies decreased (and could be further decreased by modifying the tolerance parameter), but was still substantial, unfortunately.
I have scrutinized two of the discrepant cases and used computer algebra software to evaluate equation (1) of the article for one case, in order to make sure that no numerical accuracy issues were involved. Result: The probability value calculated by SAS was accurate to 13 significant decimal places! So, I'm fairly confident that the accuracy is sufficient (at least for this case). In fact, the calculation showed that for n=67, p0=0.55, d=0.1 (i.e., the confidence interval [0.45, 0.65]) and alpha=0.1 the LHS of equation (1), using x=37 (because 0.55*67=36.85) results in 0.0976628..., which is apparently <alpha. According to p. 2860 of the article, the "appropriate sample size has been reached when the sum of the tail probabilities [calculated with eqn. (1)] is less than ... alpha ..." Hence, it should be 67 (since all smaller sample sizes led to values >0.1). Interestingly, the "sawtooth" effect does not interfere with this result (for this specific combination of parameters), i.e., when the sample size is increased, the calculated probability remains <0.1.
So, the question is, why G.T. Fosgate's algorithm obtained 68, not 67 (see Table I). It is true, however, that the exact tail probabilities (as calculated with the formulas on p. 2858) have a sum greater than 0.1, but they would drop below this value not before n=76, if I'm not mistaken.
There is also another minor issue: On p. 2860 it says that the "value of x ... is always 1 at the first iteration ..." I'm not too sure about this: Wouldn't x be 99 for p0=0.99 and n=100 or did I misunderstand the author?
Anyway, here is my draft code:
/* Define sample size functions f1 and the modified version f2 */
proc fcmp outlib=work.funcs.test;
function f1(pL, pU, conf); /* pL, pU: lower/upper confidence limit, conf: confidence level */
alpha=round(1-conf, 1e-10);
p0=(pL+pU)/2;
do n0=1 to 10000 until(n0*p0-int(n0*p0)<1e-10);
end;
do n=1 to 10000 until(p<alpha);
x=round(n*p0);
p=round((pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n),1e-10);
end;
return(n);
endsub;
function f2(pL, pU, conf, tol); /* tol: tolerance parameter */
alpha=round(1-conf, 1e-10);
p0=(pL+pU)/2;
do n0=1 to 10000 until(n0*p0-int(n0*p0)<1e-10);
end;
do n=n0 to 10000;
x=round(n*p0);
p=round((pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n),1e-10);
if p<alpha & n1=. then n1=n;
else if p>=alpha+tol then n1=.;
end;
return(n1);
endsub;
quit;
options cmplib=work.funcs;
/* Try to replicate the values corresponding to Table I of the article */
data test;
do i=50 to 90 by 5;
p0=i/100;
do conf=0.9, 0.95, 0.99;
do d=0.1, 0.05;
e1=f1(p0-d, p0+d, conf);
e2=f2(p0-d, p0+d, conf, 0);
e2t=f2(p0-d, p0+d, conf, 0.0005);
a=ceil(p0*(1-p0)*(probit(1-(1-conf)/2)/d)**2);
output;
end;
end;
end;
drop i;
run;
proc sort data=test;
by conf descending d p0;
run;
/* Read Table I of the article, "Exact" and "Approx" columns stacked side by side */
data orig;
input e a;
p0=(50+5*mod(_n_-1,9))/100;
if mod(_n_-1,18)<9 then d=0.1;
else d=0.05;
if _n_<=18 then conf=0.9;
else if _n_<=36 then conf=0.95;
else conf=0.99;
cards;
68 68
68 67
65 65
... /* please copy from the PDF yourself */
353 339
260 239
;
/* Compare the results */
proc compare data=test c=orig;
id p0 conf d notsorted;
var e:;
with e e e;
run;
/* Investigate a particular discrepancy */
data ttt;
pL=0.45; pU=0.65; conf=0.9;
alpha=round(1-conf, 1e-10);
put alpha= best20.;
put alpha= hex16.;
p0=round((pL+pU)/2, 1e-10);
put p0= best20.;
put p0= hex16.;
do n=1 to 120;
x=round(n*p0);
p=(pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n);
output;
end;
proc print width=min;
format p best16.;
run;
/* Calculate some exact tail probabilities */
data _null_;
x1=1-cdf('binom',36,0.45,67);
x2=cdf('binom',37,0.65,67);
x=x1+x2;
put (x:)(=);
run; /* 0.122179 */
data _null_;
x1=1-cdf('binom',36,0.45,68);
x2=cdf('binom',37,0.65,68);
x=x1+x2;
put (x:)(=);
run; /* 0.121597 */
data _null_;
x1=1-cdf('binom',40,0.45,75);
x2=cdf('binom',41,0.65,75);
x=x1+x2;
put (x:)(=);
run; /* 0.100311 */
data _null_;
x1=1-cdf('binom',41,0.45,76);
x2=cdf('binom',42,0.65,76);
x=x1+x2;
put (x:)(=);
run; /* 0.096772 */
I hope this helps. Please don't hesitate to ask if you have further questions.
In any case, as @ballardw has suggested, you can calculate tail probabilities fairly easily, both exactly and approxmately using equation (1), see my code above. So, for a given combination of parameters you will be able to obtain the optimal sample size with only little effort and without complicated general algorithms.
... View more