Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Binomial Confidence Intervals without dataset

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-01-2016 06:18 PM
(2670 views)

How can I calculate a confidence interval for a binomial distributed proportion without using a dataset?

Is there a comman equivalent to the Stata cii n Y command?

I have the sample size and probability of success, and I need to calculate the CI.

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If your sample size is big ,and can use Normal Asymptoic distribution to calculate it . Otherwise , you need calculated the quantitle of binomial distribution.

here is the code from Rick's blog .

http://blogs.sas.com/content/iml/2015/10/28/simulation-exact-tables.html

p = mean(x); /* Probability of success */ se = sqrt(p*(1-p) / (n-1)); /* standard error,n is the sample size */ z = quantile("Normal", 1-alpha/2); /* two-sided,if n is samll, try Binomial */ LowerCL = p - z*se; UpperCL = p + z*se;

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If your sample size is big ,and can use Normal Asymptoic distribution to calculate it . Otherwise , you need calculated the quantitle of binomial distribution.

here is the code from Rick's blog .

http://blogs.sas.com/content/iml/2015/10/28/simulation-exact-tables.html

p = mean(x); /* Probability of success */ se = sqrt(p*(1-p) / (n-1)); /* standard error,n is the sample size */ z = quantile("Normal", 1-alpha/2); /* two-sided,if n is samll, try Binomial */ LowerCL = p - z*se; UpperCL = p + z*se;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

great, thank you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can approximate the solution by creating a dataset that has the properties and then using proc freq.

I've included the code below and for comparisons @Ksharp s code so you can test your results.

Hope this helps.

```
data have;
n=2000;
p=0.7;
type='X';
X=floor(p*n); output;
type='Y'; x=n-x; output;
run;
data ci;
alpha=0.05;
n=2000;
p=0.7; /* Probability of success */
se = sqrt(p*(1-p) / (n-1)); /* standard error,n is the sample size */
z = quantile("Normal", 1-(alpha/2)); /* two-sided,if n is samll, try Binomial */
LowerCL = p - z*se;
UpperCL = p + z*se;
output;
run;
proc freq data=have;
table type/binomial (p=0.7);
weight x;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

thank you. This is what I was considering doing. It seems it might be a bit more straightforward.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It should be noted that the CI computed by the formula suggested is not identical to the similar CI computed by PROC FREQ. The latter is the well-known Wald CI with denominator n in the radicand, whereas the formula copied from @Rick_SAS's blog uses denominator n-1.

The standard error of point estimator p (sample proportion of successes) of binomial parameter P is sqrt(P*(1-P)/n) (Fleiss et al., Statistical Methods for Rates and Proportions, 3rd ed., p. 26). That's where denominator n comes from.

Denominator n-1 could be justified by the fact that n*p*(1-p)/(n-1) is the uniform minimum variance unbiased estimator of P*(1-P) (Lehmann/Casella, Theory of Point Estimation, 2nd ed., p. 100).

Obviously, with denominator n-1 the CI is wider, but only slightly so if n is large.

Just for curiosity I computed coverage probabilities for this CI ("CI2") and compared them to those of the Wald CI ("CI1") and of the Wald CI with continuity correction ("CI3"), p -/+ (z*sqrt(p*(1-p)/n)+1/(2*n)) (Fleiss et al., p. 29; z=probit(0.975)), all for confidence level 0.95. The investigation was restricted to 17<=n<=200 and P=k/10000 with integers k such that nP>=5 and n(1-P)>=5 (criterion from Fleiss et al., p. 26), a total of 1,590,280 pairs (n, P).

__Some results:__

- Among the investigated 184 values of n, the proportion of cases (i.e. values of p) with coverage probabilities (CP) >=0.95 varied between 36.6% and 84.1% for CI3 (>=50% for each n>=24), but reached only 0.3 - 24.5% for CI1 and 4.5 - 28.1% for CI2.
- Ranges of minimum coverage probabilities were [0.873, 0.894] for CI1, [0.874, 0.899] for CI2 and still only [0.896, 0.914] for CI3, in spite of the restrictions nP>=5 and n(1-P)>=5.
- CP for CI1 and CI2 were equal in 83.6 - 94.8% of cases ("CP1=CP2").
- For each n (14, 15, ..., 200) there were >1.5% cases with CP1 < CP2 <= 0.95. For some n, e.g. n=34, this proportion exceeded 10%.
- The proportion of cases with CP1 < 0.95 < CP2 ranged from 1.9% to 11.1%. In most (>85%) of these cases CP2 was closer to 0.95 than CP1.
- Not a single case was observed in which 0.95 <= CP1 < CP2.
- In many cases (26.8 - 55.9%) CP2 was equal to CP3.
- Proportions of cases with CP2 < 0.95 < CP3 and CP3 closer to 0.95 had a similar range: 22.4 - 50.7%.
- The range of proportions of cases with 0.95 < CP2 < CP3 was only 0 - 8.0%.
- For only 16 values of n the proportion of cases with either CP1 or CP2 in ]0.94, 0.96[ and CP3 outside this interval was greater than the proportion of cases showing the opposite behavior.
- Overall, the proportions of cases with CP in ]0.94, 0.96[ were 61.3%, 65.8% and 71.6%, respectively (CP1, CP2, CP3).

So, at least for 17<=n<=200 the CI with denominator n-1 has advantages over the standard Wald CI. In many of these cases a continuity correction makes sense.

However, there are CIs for the binomial proportion with even better coverage properties without being overly conservative (as the Clopper-Pearson CI): see Brown et al. (2001), Interval Estimation for a Binomial Proportion, where also some of the above results on CI1 (and many more) can be found.

Below (under the spoiler tag) is part of the code I used for my investigations.

Spoiler

```
%macro binci(n);
/* Compute the three 95% CIs for all possible outcomes of n Bernoulli samples */
data ci95;
do k=0 to &n;
p0=k/&n;
z=probit(0.975);
d=z*sqrt(p0*(1-p0)/&n);
pL1=p0-d;
pU1=p0+d;
d=d+1/(2*&n);
pL3=p0-d;
pU3=p0+d;
d=z*sqrt(p0*(1-p0)/(&n-1));
pL2=p0-d;
pU2=p0+d;
output;
end;
drop p0 z d;
run;
/* Determine the range of true parameter p (times 10000) in which np>=5 and n(1-p)>=5 */
%let iL=%sysfunc(ceil(50000/&n));
%let iU=%eval(10000-&iL);
/* Calculate coverage probabilities depending on p, incrementing p by 0.0001 */
data tmp;
set ci95;
do i=&iL to &iU;
p=i/10000;
w1=(pL1 <= p <= pU1)*pdf('binom',k,p,&n);
w2=(pL2 <= p <= pU2)*pdf('binom',k,p,&n);
w3=(pL3 <= p <= pU3)*pdf('binom',k,p,&n);
output;
end;
drop i;
run;
proc summary data=tmp nway;
class p;
var w1-w3;
output out=covprob sum=;
run;
/* Compute descriptive statistics and create plot for coverage probabilities */
proc means data=covprob mean std min q1 median q3 max;
var w1-w3;
run;
goptions reset=all;
symbol1 v=x c=red h=0.9;
symbol2 v=plus c=orange h=0.6;
symbol3 v=dot c=green h=0.3;
axis1 order=(0.85 to 1 by 0.005)
label=(angle=90 'Coverage probability');
legend1 label=('CI') value=('Wald' 'Wald, but with denominator n-1' 'Wald with continuity correction')
across=1 mode=protect position=(top right inside) shape=symbol(3, .9);
title "Coverage Probabilities of 95% Confidence Intervals for Binomial Parameter p when n=&n, np>=5 and n(1-p)>=5";
proc gplot data=covprob;
plot w1*p=1 w2*p=2 w3*p=3 / overlay vaxis=axis1 vref=0.95 legend=legend1;
run;
quit;
title;
%mend binci;
%binci(17) /* n= 17, p=0.2942, 0.2943, 0.2944, ..., 0.7057, 0.7058 */
%binci(25) /* n= 25, p=0.2, 0.2001, 0.2002, ..., 0.7999, 0.8 */
%binci(50) /* n= 50, p=0.1, 0.1001, 0.1002, ..., 0.8999, 0.9 */
%binci(100) /* n=100, p=0.05, 0.0501, 0.0502, ..., 0.9499, 0.95 */
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@FreelanceReinh and @Ksharp Sorry about that n-1 in the denominator. I suspect I copied and pasted that code from some earlier program that computed CIs for a different distribution. I intended to use the same formula that PROC FREQ uses. I will have to edit the blog post to correct that error. I suggest that the OP use the (correct!) Wald formula.

By the way, I wrote a blog post about the normal approximation to the binomial distribution, which is the basis for the CI formula. The experts might enjoy reading it.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

thank you, everoyone. it's good to know there are options.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.