Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Binomial Confidence Intervals without dataset

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2016 06:18 PM - edited 02-01-2016 06:56 PM

How can I calculate a confidence interval for a binomial distributed proportion without using a dataset?

Is there a comman equivalent to the Stata cii n Y command?

I have the sample size and probability of success, and I need to calculate the CI.

Thank you.

Accepted Solutions

Solution

02-01-2016
09:02 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2016 08:33 PM

If your sample size is big ,and can use Normal Asymptoic distribution to calculate it . Otherwise , you need calculated the quantitle of binomial distribution.

here is the code from Rick's blog .

http://blogs.sas.com/content/iml/2015/10/28/simulation-exact-tables.html

p = mean(x); /* Probability of success */ se = sqrt(p*(1-p) / (n-1)); /* standard error,n is the sample size */ z = quantile("Normal", 1-alpha/2); /* two-sided,if n is samll, try Binomial */ LowerCL = p - z*se; UpperCL = p + z*se;

All Replies

Solution

02-01-2016
09:02 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2016 08:33 PM

If your sample size is big ,and can use Normal Asymptoic distribution to calculate it . Otherwise , you need calculated the quantitle of binomial distribution.

here is the code from Rick's blog .

http://blogs.sas.com/content/iml/2015/10/28/simulation-exact-tables.html

p = mean(x); /* Probability of success */ se = sqrt(p*(1-p) / (n-1)); /* standard error,n is the sample size */ z = quantile("Normal", 1-alpha/2); /* two-sided,if n is samll, try Binomial */ LowerCL = p - z*se; UpperCL = p + z*se;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2016 09:03 PM

great, thank you!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2016 09:03 PM

You can approximate the solution by creating a dataset that has the properties and then using proc freq.

I've included the code below and for comparisons @Ksharp s code so you can test your results.

Hope this helps.

```
data have;
n=2000;
p=0.7;
type='X';
X=floor(p*n); output;
type='Y'; x=n-x; output;
run;
data ci;
alpha=0.05;
n=2000;
p=0.7; /* Probability of success */
se = sqrt(p*(1-p) / (n-1)); /* standard error,n is the sample size */
z = quantile("Normal", 1-(alpha/2)); /* two-sided,if n is samll, try Binomial */
LowerCL = p - z*se;
UpperCL = p + z*se;
output;
run;
proc freq data=have;
table type/binomial (p=0.7);
weight x;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2016 09:06 PM

thank you. This is what I was considering doing. It seems it might be a bit more straightforward.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-11-2016 06:15 PM

It should be noted that the CI computed by the formula suggested is not identical to the similar CI computed by PROC FREQ. The latter is the well-known Wald CI with denominator n in the radicand, whereas the formula copied from @Rick_SAS's blog uses denominator n-1.

The standard error of point estimator p (sample proportion of successes) of binomial parameter P is sqrt(P*(1-P)/n) (Fleiss et al., Statistical Methods for Rates and Proportions, 3rd ed., p. 26). That's where denominator n comes from.

Denominator n-1 could be justified by the fact that n*p*(1-p)/(n-1) is the uniform minimum variance unbiased estimator of P*(1-P) (Lehmann/Casella, Theory of Point Estimation, 2nd ed., p. 100).

Obviously, with denominator n-1 the CI is wider, but only slightly so if n is large.

Just for curiosity I computed coverage probabilities for this CI ("CI2") and compared them to those of the Wald CI ("CI1") and of the Wald CI with continuity correction ("CI3"), p -/+ (z*sqrt(p*(1-p)/n)+1/(2*n)) (Fleiss et al., p. 29; z=probit(0.975)), all for confidence level 0.95. The investigation was restricted to 17<=n<=200 and P=k/10000 with integers k such that nP>=5 and n(1-P)>=5 (criterion from Fleiss et al., p. 26), a total of 1,590,280 pairs (n, P).

__Some results:__

- Among the investigated 184 values of n, the proportion of cases (i.e. values of p) with coverage probabilities (CP) >=0.95 varied between 36.6% and 84.1% for CI3 (>=50% for each n>=24), but reached only 0.3 - 24.5% for CI1 and 4.5 - 28.1% for CI2.
- Ranges of minimum coverage probabilities were [0.873, 0.894] for CI1, [0.874, 0.899] for CI2 and still only [0.896, 0.914] for CI3, in spite of the restrictions nP>=5 and n(1-P)>=5.
- CP for CI1 and CI2 were equal in 83.6 - 94.8% of cases ("CP1=CP2").
- For each n (14, 15, ..., 200) there were >1.5% cases with CP1 < CP2 <= 0.95. For some n, e.g. n=34, this proportion exceeded 10%.
- The proportion of cases with CP1 < 0.95 < CP2 ranged from 1.9% to 11.1%. In most (>85%) of these cases CP2 was closer to 0.95 than CP1.
- Not a single case was observed in which 0.95 <= CP1 < CP2.
- In many cases (26.8 - 55.9%) CP2 was equal to CP3.
- Proportions of cases with CP2 < 0.95 < CP3 and CP3 closer to 0.95 had a similar range: 22.4 - 50.7%.
- The range of proportions of cases with 0.95 < CP2 < CP3 was only 0 - 8.0%.
- For only 16 values of n the proportion of cases with either CP1 or CP2 in ]0.94, 0.96[ and CP3 outside this interval was greater than the proportion of cases showing the opposite behavior.
- Overall, the proportions of cases with CP in ]0.94, 0.96[ were 61.3%, 65.8% and 71.6%, respectively (CP1, CP2, CP3).

So, at least for 17<=n<=200 the CI with denominator n-1 has advantages over the standard Wald CI. In many of these cases a continuity correction makes sense.

However, there are CIs for the binomial proportion with even better coverage properties without being overly conservative (as the Clopper-Pearson CI): see Brown et al. (2001), Interval Estimation for a Binomial Proportion, where also some of the above results on CI1 (and many more) can be found.

Below (under the spoiler tag) is part of the code I used for my investigations.

Spoiler

```
%macro binci(n);
/* Compute the three 95% CIs for all possible outcomes of n Bernoulli samples */
data ci95;
do k=0 to &n;
p0=k/&n;
z=probit(0.975);
d=z*sqrt(p0*(1-p0)/&n);
pL1=p0-d;
pU1=p0+d;
d=d+1/(2*&n);
pL3=p0-d;
pU3=p0+d;
d=z*sqrt(p0*(1-p0)/(&n-1));
pL2=p0-d;
pU2=p0+d;
output;
end;
drop p0 z d;
run;
/* Determine the range of true parameter p (times 10000) in which np>=5 and n(1-p)>=5 */
%let iL=%sysfunc(ceil(50000/&n));
%let iU=%eval(10000-&iL);
/* Calculate coverage probabilities depending on p, incrementing p by 0.0001 */
data tmp;
set ci95;
do i=&iL to &iU;
p=i/10000;
w1=(pL1 <= p <= pU1)*pdf('binom',k,p,&n);
w2=(pL2 <= p <= pU2)*pdf('binom',k,p,&n);
w3=(pL3 <= p <= pU3)*pdf('binom',k,p,&n);
output;
end;
drop i;
run;
proc summary data=tmp nway;
class p;
var w1-w3;
output out=covprob sum=;
run;
/* Compute descriptive statistics and create plot for coverage probabilities */
proc means data=covprob mean std min q1 median q3 max;
var w1-w3;
run;
goptions reset=all;
symbol1 v=x c=red h=0.9;
symbol2 v=plus c=orange h=0.6;
symbol3 v=dot c=green h=0.3;
axis1 order=(0.85 to 1 by 0.005)
label=(angle=90 'Coverage probability');
legend1 label=('CI') value=('Wald' 'Wald, but with denominator n-1' 'Wald with continuity correction')
across=1 mode=protect position=(top right inside) shape=symbol(3, .9);
title "Coverage Probabilities of 95% Confidence Intervals for Binomial Parameter p when n=&n, np>=5 and n(1-p)>=5";
proc gplot data=covprob;
plot w1*p=1 w2*p=2 w3*p=3 / overlay vaxis=axis1 vref=0.95 legend=legend1;
run;
quit;
title;
%mend binci;
%binci(17) /* n= 17, p=0.2942, 0.2943, 0.2944, ..., 0.7057, 0.7058 */
%binci(25) /* n= 25, p=0.2, 0.2001, 0.2002, ..., 0.7999, 0.8 */
%binci(50) /* n= 50, p=0.1, 0.1001, 0.1002, ..., 0.8999, 0.9 */
%binci(100) /* n=100, p=0.05, 0.0501, 0.0502, ..., 0.9499, 0.95 */
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-12-2016 06:31 AM

@FreelanceReinhard and @Ksharp Sorry about that n-1 in the denominator. I suspect I copied and pasted that code from some earlier program that computed CIs for a different distribution. I intended to use the same formula that PROC FREQ uses. I will have to edit the blog post to correct that error. I suggest that the OP use the (correct!) Wald formula.

By the way, I wrote a blog post about the normal approximation to the binomial distribution, which is the basis for the CI formula. The experts might enjoy reading it.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-14-2016 04:54 PM

After further thought, I no longer think there was an error in my original blog post. My blog post was about estimating a p-value for a Monte Carlo simulation. As shown in this PROC FREQ documentation, the standard error for the Monte Carlo estimate of a binomial proportion has an N-1 in the denominator. It hardly matters when you are doing 10,000 Monte Carlo simulations, but N-1 is the correct denominator. (The OP for this problem should still use N, since that is correct for the SE for the usual estimator p_hat = k/N.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-12-2016 08:40 AM

thank you, everoyone. it's good to know there are options.