01-06-2016 01:57 PM
Please help me with the code for estimating sample sizes for binomial proportions when we have the values for proportion, error and confidence levels.
Two medthods of interest are :
1. Sample size using modified exact computer algorithm
2. Sample size using large sample normal approximation method
01-06-2016 03:22 PM
Have you looked at the documentation for Proc Power? Or do you have the SAS Power and Sample Size wizard installed which is an option when installing SAS Stat?
01-06-2016 03:37 PM
For item 2, the approximate sample size n needed to obtain a 100(1-alpha)% confidence interval for a binomial proportion p with length +/-d (i.e. 2d) using a planning value p0 (to be set to 0.5 in order to achieve the most conservative n), I suggest (with arbitrary values for p0, d and alpha just as an example):
data _null_; p0=0.2; d=0.08; alpha=0.1; n=ceil(p0*(1-p0)*(probit(1-alpha/2)/d)**2); put n=; run;
(formula according to the classic reference Statistical Intervals: A Guide for Practitioners, p. 145)
As to item 1, I'm not sure what the "modified exact computer algorithm" is. Do you refer to this article by G.T. Fosgate? I don't have access to it.
01-07-2016 01:51 PM
Thank you very much for the code. It worked for normal approximation.
Yes I was reffereing to the article by G.T Fosgate. I tried to find the minimum sample size using proc power and it doesnot work. I have access to the article and I am attaching it here. I am confued on what SAS pfunction deals with binomial proportions. Thanks a lot for all the help
01-07-2016 05:36 PM
Many thanks for providing the article. I have read the relevant parts of it. Very interesting! Tomorrow (CET) I will start trying to implement that algorithm in SAS.
01-08-2016 11:26 AM
Your online documentation for Proc Power should have an example named "The Sawtooth Power Function in Proportion Analysis"
It demonstrates using the normal approximation to get an estimated sample size and then examining the power available for sample sizes around that estimate. Use those results to pick a sample and examine the exact power. If that value works for your project then okay. Else pick another, wash, rinse, repeat until you have a sample and power that you are satisfied with.
The Power procedure doesn't do n for specific power for exact binomiial because of the behavior of the binomial there will almost never be an exact sample size match for a given power and proportion. So if you are looking for a power of .8 you may have to use a sample size that yeilds a power of .791 or .808 or some such.
Review generating exact confidence intervals for small sample, < 20, binomial probabilities.
The example also demonstrates that the Power may actually decrease for minor increases in sample size
01-08-2016 03:45 PM
Thanks for the detailed example. I am confused what the null proportion is in the Proc Power. The other confusion would be the hypothesis for the power in this scenario.
What would be the hypothesis for this ? H0: Confidence Intervals using Mid P greater than Confidence Intervals using normal Approximation?
And what is Null proportion and i know that the proportion the the proc power is the Binomial proportion.
Thank agin for all the inputs. Appreciate your explaination. Sorry if these sound a bit dumb.
01-08-2016 03:47 PM
Thanks again for all your effort. I was just looking for a similar equation like
using Mid- P Confidence Interval for the binomial proportions. Your previous code was simple and effective.
01-08-2016 04:59 PM
Sorry for not responding earlier (was busy with other work). I have a draft version of a function f1 which I thought would be an implementation of G. T. Fosgate's algorithm. However, when I compared his Table I with my results, I got a lot of discrepancies for the "Exact" columns (whereas the "Approx" columns matched perfectly, which was of course not that surprising).
Then I created a slightly modified version f2 of my function (introduced a "tolerance" parameter), which takes the "sawtooth" effect into account (cf. @ballardw's post). As a result, the number of discrepancies decreased (and could be further decreased by modifying the tolerance parameter), but was still substantial, unfortunately.
I have scrutinized two of the discrepant cases and used computer algebra software to evaluate equation (1) of the article for one case, in order to make sure that no numerical accuracy issues were involved. Result: The probability value calculated by SAS was accurate to 13 significant decimal places! So, I'm fairly confident that the accuracy is sufficient (at least for this case). In fact, the calculation showed that for n=67, p0=0.55, d=0.1 (i.e., the confidence interval [0.45, 0.65]) and alpha=0.1 the LHS of equation (1), using x=37 (because 0.55*67=36.85) results in 0.0976628..., which is apparently <alpha. According to p. 2860 of the article, the "appropriate sample size has been reached when the sum of the tail probabilities [calculated with eqn. (1)] is less than ... alpha ..." Hence, it should be 67 (since all smaller sample sizes led to values >0.1). Interestingly, the "sawtooth" effect does not interfere with this result (for this specific combination of parameters), i.e., when the sample size is increased, the calculated probability remains <0.1.
So, the question is, why G.T. Fosgate's algorithm obtained 68, not 67 (see Table I). It is true, however, that the exact tail probabilities (as calculated with the formulas on p. 2858) have a sum greater than 0.1, but they would drop below this value not before n=76, if I'm not mistaken.
There is also another minor issue: On p. 2860 it says that the "value of x ... is always 1 at the first iteration ..." I'm not too sure about this: Wouldn't x be 99 for p0=0.99 and n=100 or did I misunderstand the author?
Anyway, here is my draft code:
/* Define sample size functions f1 and the modified version f2 */ proc fcmp outlib=work.funcs.test; function f1(pL, pU, conf); /* pL, pU: lower/upper confidence limit, conf: confidence level */ alpha=round(1-conf, 1e-10); p0=(pL+pU)/2; do n0=1 to 10000 until(n0*p0-int(n0*p0)<1e-10); end; do n=1 to 10000 until(p<alpha); x=round(n*p0); p=round((pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n),1e-10); end; return(n); endsub; function f2(pL, pU, conf, tol); /* tol: tolerance parameter */ alpha=round(1-conf, 1e-10); p0=(pL+pU)/2; do n0=1 to 10000 until(n0*p0-int(n0*p0)<1e-10); end; do n=n0 to 10000; x=round(n*p0); p=round((pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n),1e-10); if p<alpha & n1=. then n1=n; else if p>=alpha+tol then n1=.; end; return(n1); endsub; quit; options cmplib=work.funcs; /* Try to replicate the values corresponding to Table I of the article */ data test; do i=50 to 90 by 5; p0=i/100; do conf=0.9, 0.95, 0.99; do d=0.1, 0.05; e1=f1(p0-d, p0+d, conf); e2=f2(p0-d, p0+d, conf, 0); e2t=f2(p0-d, p0+d, conf, 0.0005); a=ceil(p0*(1-p0)*(probit(1-(1-conf)/2)/d)**2); output; end; end; end; drop i; run; proc sort data=test; by conf descending d p0; run; /* Read Table I of the article, "Exact" and "Approx" columns stacked side by side */ data orig; input e a; p0=(50+5*mod(_n_-1,9))/100; if mod(_n_-1,18)<9 then d=0.1; else d=0.05; if _n_<=18 then conf=0.9; else if _n_<=36 then conf=0.95; else conf=0.99; cards; 68 68 68 67 65 65 ... /* please copy from the PDF yourself */ 353 339 260 239 ; /* Compare the results */ proc compare data=test c=orig; id p0 conf d notsorted; var e:; with e e e; run; /* Investigate a particular discrepancy */ data ttt; pL=0.45; pU=0.65; conf=0.9; alpha=round(1-conf, 1e-10); put alpha= best20.; put alpha= hex16.; p0=round((pL+pU)/2, 1e-10); put p0= best20.; put p0= hex16.; do n=1 to 120; x=round(n*p0); p=(pdf('binom',x,pL,n)+pdf('binom',x,pU,n))/2+1-cdf('binom',x,pL,n)+cdf('binom',x-1,pU,n); output; end; proc print width=min; format p best16.; run; /* Calculate some exact tail probabilities */ data _null_; x1=1-cdf('binom',36,0.45,67); x2=cdf('binom',37,0.65,67); x=x1+x2; put (x:)(=); run; /* 0.122179 */ data _null_; x1=1-cdf('binom',36,0.45,68); x2=cdf('binom',37,0.65,68); x=x1+x2; put (x:)(=); run; /* 0.121597 */ data _null_; x1=1-cdf('binom',40,0.45,75); x2=cdf('binom',41,0.65,75); x=x1+x2; put (x:)(=); run; /* 0.100311 */ data _null_; x1=1-cdf('binom',41,0.45,76); x2=cdf('binom',42,0.65,76); x=x1+x2; put (x:)(=); run; /* 0.096772 */
I hope this helps. Please don't hesitate to ask if you have further questions.
In any case, as @ballardw has suggested, you can calculate tail probabilities fairly easily, both exactly and approxmately using equation (1), see my code above. So, for a given combination of parameters you will be able to obtain the optimal sample size with only little effort and without complicated general algorithms.
01-08-2016 05:28 PM
Thanks again for all the help. I value your time and effort. I will try to grasp and replicate the code and process you have described and come back with my results.
I will come back to your last question too. Have a wonderful weekend.