BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MaryA_Marion
Lapis Lazuli | Level 10

single proportions and exact intervals.png

 

Output from proc freq gives a Clopper-Pearson (exact) confidence interval but how do I compute the accompanying statistic of a test of p=po and its pvalue?

Also when I am working with single binomial proportion, how is Fishers exact test computed. It is also listed on the output. Where can I find freq documentation about that. I must be using wrong search criteria for I am not seeing the answers for each of these questions. My notes have approximate and exact z statistics varying by denominator as seen in snaphot at beginning of this post.

 

title " Ho: p=.7 vs Ha: p ne .7";
data CellCounts; input R $ Count;
np=40*.7; nq=40*.3;
datalines;
yes 35
no 5
;
run;

proc freq data = CellCounts order=freq;
  tables R / binomial(p=0.7 exact score) alpha=.05;
  exact binomial;
  weight Count;
run; quit;

Thank you. MM

 

1 ACCEPTED SOLUTION

Accepted Solutions
jsuebersax
Fluorite | Level 6

More info. You wrote:

 

> Also when I am working with single binomial proportion, how is Fishers exact test computed.

 

Fisher's exact test only applies to two-way tables. According to SAS documentation:

 

For one-way tables, PROC FREQ provides exact p-values for the binomial proportion test

My guess is that the "binomial proportion test" is basically what I mentioned in my previous answer — i.e., based on the cumulative binomial probability distribution for some hypothesized population proportion and given sample size N.

This code

data;
   y = 1 - cdf('BINOM', 34, .7, 40);
   put y;
run;

gives the probability of getting 35 or more successes out of 40 trials as 0.008618. That's the same value I get for the One-sided Exact Test using these lines:

data one;
  do i = 1 to 35; x = 1; output; end;
  do i = 1 to 5; x = 2; output; end;
run;
proc freq data = one;
  tables x / binomial(exact score p=0.7);
  exact binomial;
run;

Hope this helps!

 

p.s. My previous reply mistakenly used p = 0.07, instead of 0.7 as in your question.

--

John Uebersax

 

 

View solution in original post

9 REPLIES 9
jsuebersax
Fluorite | Level 6

Hi MM,

Please let me be sure I understand the issue correctly.  It appears to me that (a) the binomial option of the tables statement in Proc Freq lets one calculate confidence intervals using various methods (exact, Wald, score, etc), but (b) in testing a hypothesis comparing an observed proportion to a hypothesized value (e.g., 0.07) that test is always the same, regardless of the method used for estimating confidence intervals.

 

So, for example, if you ask for a Clopper-Pearson confidence interval, SAS will calculate that; but if you also request a hypothesis test, SAS will always use an asymptotic (probably Wald) method. Is that what you've found?

 

I'll try to look into this further. Note, though, that if the hypothesized rate (0.07) falls outside the confidence limits for the observed data (however those are calculated), some people might take that as sufficient evidence to reject H0; not sure if that's strictly kosher, though.


Come to think of it, I don't see why one couldn't just examine the binomial distribution for pi = 0.07 and the given sample size n.  If k observed successes falls in the lower or upper .025 area of that distribution, one might reject H0:pi =0.07.  SAS has built in functions for the cumulative binomial distribution, so one could perform a hypothesis test and get a p-value that way.

What I'm also wondering at this point is whether Clopper-Pearson is only designed to supply confidence intervals, not to perform hypothesis tests. For the latter, exact binomial distributions might be better. 

Hope this helps.

--

John Uebersax

MaryA_Marion
Lapis Lazuli | Level 10

John,

Thank you for your reply. SAS is not always consistent in proc freq. If you look at my calculations in the enclosed file you will see that the Wald test and Wald confidence interval do not appear on the SAS output on page 1. I'm trying to figure out what exact test is being computed and how? Can you help?

MaryA_Marion
Lapis Lazuli | Level 10

I closed this question too quickly. Please remain open. CLopper-Pearson and Wilson Score stats for p=.875 need to be done by hand. I am getting a different answer for wilson. Continuity correction was not added?  MM

jsuebersax
Fluorite | Level 6

More info. You wrote:

 

> Also when I am working with single binomial proportion, how is Fishers exact test computed.

 

Fisher's exact test only applies to two-way tables. According to SAS documentation:

 

For one-way tables, PROC FREQ provides exact p-values for the binomial proportion test

My guess is that the "binomial proportion test" is basically what I mentioned in my previous answer — i.e., based on the cumulative binomial probability distribution for some hypothesized population proportion and given sample size N.

This code

data;
   y = 1 - cdf('BINOM', 34, .7, 40);
   put y;
run;

gives the probability of getting 35 or more successes out of 40 trials as 0.008618. That's the same value I get for the One-sided Exact Test using these lines:

data one;
  do i = 1 to 35; x = 1; output; end;
  do i = 1 to 5; x = 2; output; end;
run;
proc freq data = one;
  tables x / binomial(exact score p=0.7);
  exact binomial;
run;

Hope this helps!

 

p.s. My previous reply mistakenly used p = 0.07, instead of 0.7 as in your question.

--

John Uebersax

 

 

MaryA_Marion
Lapis Lazuli | Level 10

I did not see this just when I replied to your original correspondence. I like what you did. I need to study this carefully. My nonparametric class was a long time ago. Another class always seemed more relevant. I need to turn that around. It is very relevant at this time.  MM

MaryA_Marion
Lapis Lazuli | Level 10

Conclusion after a lot of study.

 

The Clopper-Pearson estimation method is based on the exact binomial distribution, and not a large sample normal approximation. Solving the enclosed equations for p setting n k and alpha will give interval bounds.

 

Do you know how to do this in SAS (preferably not in proc iml) ?

FreelanceReinh
Jade | Level 19

Hello @MaryA_Marion,

 

The formula for the Clopper-Pearson confidence interval can be found in subsection Exact (Clopper-Pearson) Confidence Limits under Binomial Proportion in the PROC FREQ documentation → Details → Statistical Computations. In a DATA step it can be implemented as follows:

data cpci;
k=35; /* number of items in category of interest */
n=40; /* total sample size */
alpha=0.05; /* 1 - confidence level */
if k>0 then lcl=1/(1+(n-k+1)/(k*finv(alpha/2,2*k,2*(n-k+1)))); else lcl=0;
if k<n then ucl=1/(1+(n-k)/((k+1)*finv(1-alpha/2,2*(k+1),2*(n-k)))); else ucl=1;
run;
MaryA_Marion
Lapis Lazuli | Level 10

I was expecting summation code but you approached it differently. Please see output in pdf file enclosed. Note the confidence intervals look like one sided confidence intervals for p=0 and 1. Can I say they are one-sided?

 

Do you know how to do summation code outside of proc iml ie using base SAS?  Please see attached tiff.pdf file. I am trying to improve my computational skills.

 

Thank you.

FreelanceReinh
Jade | Level 19

You can use the summation formulas to verify the confidence limits. In the non-degenerate case 0<x<n (note that the x in your formula corresponds to variable k in my code) both the lower tail probability and the upper tail probability should be a/2 (for a two-sided 100(1-a)% confidence interval). See variables ltailp and utailp in the code below.

data chk(drop=i);
set cpci;
do i=0 to k;
  ltailp+comb(n,i)*ucl**i*ifn(1-ucl | n-i, 1-ucl, 1)**(n-i);
end;
do i=k to n;
  utailp+comb(n,i)*ifn(lcl | i, lcl, 1)**i*(1-lcl)**(n-i);
end;
run;

 

In the degenerate cases (x=0, x=n) one of the tail probabilities is trivially 1 while the other is a/2. The non-trivial of the two confidence limits is in fact a one-sided confidence limit, but its confidence level is 1-a/2, not 1-a. This interpretation holds for the non-degenerate case as well: For example, the lower limit of a two-sided exact 95% CI equals the lower limit of an upper one-sided exact 97.5% CI. Similarly, the upper limit of a two-sided exact 95% CI equals the upper limit of a lower one-sided exact 97.5% CI. If the statistical plan is to compute a two-sided exact 95% confidence interval, then this is the correct description of the interval also in the degenerate cases.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 5675 views
  • 2 likes
  • 3 in conversation