Solved: Re: compute an exact confidence interval for a single proportion by ha...

MaryA_Marion · Posted 09-10-2021 02:46 PM

Output from proc freq gives a Clopper-Pearson (exact) confidence interval but how do I compute the accompanying statistic of a test of p=po and its pvalue?

Also when I am working with single binomial proportion, how is Fishers exact test computed. It is also listed on the output. Where can I find freq documentation about that. I must be using wrong search criteria for I am not seeing the answers for each of these questions. My notes have approximate and exact z statistics varying by denominator as seen in snaphot at beginning of this post.

title " Ho: p=.7 vs Ha: p ne .7";
data CellCounts; input R $ Count;
np=40*.7; nq=40*.3;
datalines;
yes 35
no 5
;
run;

proc freq data = CellCounts order=freq;
  tables R / binomial(p=0.7 exact score) alpha=.05;
  exact binomial;
  weight Count;
run; quit;

Thank you. MM

jsuebersax · Posted 09-10-2021 06:47 PM

More info. You wrote:

> Also when I am working with single binomial proportion, how is Fishers exact test computed.

Fisher's exact test only applies to two-way tables. According to SAS documentation:

For one-way tables, PROC FREQ provides exact p-values for the binomial proportion test

My guess is that the "binomial proportion test" is basically what I mentioned in my previous answer — i.e., based on the cumulative binomial probability distribution for some hypothesized population proportion and given sample size N.

This code

data;
   y = 1 - cdf('BINOM', 34, .7, 40);
   put y;
run;

gives the probability of getting 35 or more successes out of 40 trials as 0.008618. That's the same value I get for the One-sided Exact Test using these lines:

data one;
  do i = 1 to 35; x = 1; output; end;
  do i = 1 to 5; x = 2; output; end;
run;
proc freq data = one;
  tables x / binomial(exact score p=0.7);
  exact binomial;
run;

Hope this helps!

p.s. My previous reply mistakenly used p = 0.07, instead of 0.7 as in your question.

--

John Uebersax

View solution in original post

jsuebersax · Posted 09-10-2021 04:17 PM

Hi MM,

Please let me be sure I understand the issue correctly. It appears to me that (a) the binomial option of the tables statement in Proc Freq lets one calculate confidence intervals using various methods (exact, Wald, score, etc), but (b) in testing a hypothesis comparing an observed proportion to a hypothesized value (e.g., 0.07) that test is always the same, regardless of the method used for estimating confidence intervals.

So, for example, if you ask for a Clopper-Pearson confidence interval, SAS will calculate that; but if you also request a hypothesis test, SAS will always use an asymptotic (probably Wald) method. Is that what you've found?

I'll try to look into this further. Note, though, that if the hypothesized rate (0.07) falls outside the confidence limits for the observed data (however those are calculated), some people might take that as sufficient evidence to reject H0; not sure if that's strictly kosher, though.

Come to think of it, I don't see why one couldn't just examine the binomial distribution for pi = 0.07 and the given sample size n. If k observed successes falls in the lower or upper .025 area of that distribution, one might reject H0:pi =0.07. SAS has built in functions for the cumulative binomial distribution, so one could perform a hypothesis test and get a p-value that way.

What I'm also wondering at this point is whether Clopper-Pearson is only designed to supply confidence intervals, not to perform hypothesis tests. For the latter, exact binomial distributions might be better.

Hope this helps.

--

John Uebersax

MaryA_Marion · Posted 09-10-2021 09:57 PM

John,

Thank you for your reply. SAS is not always consistent in proc freq. If you look at my calculations in the enclosed file you will see that the Wald test and Wald confidence interval do not appear on the SAS output on page 1. I'm trying to figure out what exact test is being computed and how? Can you help?

MaryA_Marion · Posted 09-11-2021 11:28 AM

I closed this question too quickly. Please remain open. CLopper-Pearson and Wilson Score stats for p=.875 need to be done by hand. I am getting a different answer for wilson. Continuity correction was not added? MM

jsuebersax · Posted 09-10-2021 06:47 PM

More info. You wrote:

> Also when I am working with single binomial proportion, how is Fishers exact test computed.

Fisher's exact test only applies to two-way tables. According to SAS documentation:

For one-way tables, PROC FREQ provides exact p-values for the binomial proportion test

My guess is that the "binomial proportion test" is basically what I mentioned in my previous answer — i.e., based on the cumulative binomial probability distribution for some hypothesized population proportion and given sample size N.

This code

data;
   y = 1 - cdf('BINOM', 34, .7, 40);
   put y;
run;

gives the probability of getting 35 or more successes out of 40 trials as 0.008618. That's the same value I get for the One-sided Exact Test using these lines:

data one;
  do i = 1 to 35; x = 1; output; end;
  do i = 1 to 5; x = 2; output; end;
run;
proc freq data = one;
  tables x / binomial(exact score p=0.7);
  exact binomial;
run;

Hope this helps!

p.s. My previous reply mistakenly used p = 0.07, instead of 0.7 as in your question.

--

John Uebersax

MaryA_Marion · Posted 09-10-2021 10:04 PM

I did not see this just when I replied to your original correspondence. I like what you did. I need to study this carefully. My nonparametric class was a long time ago. Another class always seemed more relevant. I need to turn that around. It is very relevant at this time. MM

MaryA_Marion · Posted 09-12-2021 09:50 PM

Conclusion after a lot of study.

The Clopper-Pearson estimation method is based on the exact binomial distribution, and not a large sample normal approximation. Solving the enclosed equations for p setting n k and alpha will give interval bounds.

Do you know how to do this in SAS (preferably not in proc iml) ?

FreelanceReinh · Posted 09-13-2021 05:44 AM

Hello @MaryA_Marion,

The formula for the Clopper-Pearson confidence interval can be found in subsection Exact (Clopper-Pearson) Confidence Limits under Binomial Proportion in the PROC FREQ documentation → Details → Statistical Computations. In a DATA step it can be implemented as follows:

data cpci;
k=35; /* number of items in category of interest */
n=40; /* total sample size */
alpha=0.05; /* 1 - confidence level */
if k>0 then lcl=1/(1+(n-k+1)/(k*finv(alpha/2,2*k,2*(n-k+1)))); else lcl=0;
if k<n then ucl=1/(1+(n-k)/((k+1)*finv(1-alpha/2,2*(k+1),2*(n-k)))); else ucl=1;
run;

MaryA_Marion · Posted 09-13-2021 09:45 AM

I was expecting summation code but you approached it differently. Please see output in pdf file enclosed. Note the confidence intervals look like one sided confidence intervals for p=0 and 1. Can I say they are one-sided?

Do you know how to do summation code outside of proc iml ie using base SAS? Please see attached tiff.pdf file. I am trying to improve my computational skills.

Thank you.

FreelanceReinh · Posted 09-13-2021 11:48 AM

You can use the summation formulas to verify the confidence limits. In the non-degenerate case 0<x<n (note that the x in your formula corresponds to variable k in my code) both the lower tail probability and the upper tail probability should be a/2 (for a two-sided 100(1-a)% confidence interval). See variables ltailp and utailp in the code below.

data chk(drop=i);
set cpci;
do i=0 to k;
  ltailp+comb(n,i)*ucl**i*ifn(1-ucl | n-i, 1-ucl, 1)**(n-i);
end;
do i=k to n;
  utailp+comb(n,i)*ifn(lcl | i, lcl, 1)**i*(1-lcl)**(n-i);
end;
run;

In the degenerate cases (x=0, x=n) one of the tail probabilities is trivially 1 while the other is a/2. The non-trivial of the two confidence limits is in fact a one-sided confidence limit, but its confidence level is 1-a/2, not 1-a. This interpretation holds for the non-degenerate case as well: For example, the lower limit of a two-sided exact 95% CI equals the lower limit of an upper one-sided exact 97.5% CI. Similarly, the upper limit of a two-sided exact 95% CI equals the upper limit of a lower one-sided exact 97.5% CI. If the statistical plan is to compute a two-sided exact 95% confidence interval, then this is the correct description of the interval also in the degenerate cases.

compute an exact confidence interval for a single proportion by hand to match output from proc freq

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Re: compute exact confidence interval for single proportion by hand to match output from proc freq

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Re: compute an exact confidence interval for a single proportion by hand to match output from proc f

Ready to join fellow brilliant minds for the SAS Hackathon?