topic Re: Inquiry: Obtaining equivalent results between R code binom.test and SAS code using PROC FREQ? in SAS Programming

Inquiry: Obtaining equivalent results between R code binom.test and SAS code using PROC FREQ?

hyates — Tue, 08 Oct 2019 03:25:58 GMT

Problem

Suppose you have a dice. You wish to test if the dice is fair for the side 6. You test this by rolling the dice 235 times and noting it lands on a 6, 51 times. You then wish to compare this to the expected rate of 1/6 = 0.1666667.

Therefore, let us test this by conducting the test in R and SAS, but note the p-value for the R code is 0.04375 and the p-value for the SAS code is 0.0531.

R Code

binom.test(51, 235, (1/6), alternative = "two.sided")

SAS Code

DATA DICEROLL;
INPUT ROLL $ FREQQ; 
DATALINES; 
NO 184
YES 51
; 

PROC FREQ DATA = DATAROLL;
WEIGHT FREQQ; 
TABLES ROLL / BINOMIAL(ALL LEVEL = "YES" P = 0.1666667) ALPHA = 0.05;
EXACT BINOMIAL;
RUN;

Additional Notes

What is strange is that the one sided tests p-values are equivalent, but I cannot get the two sided p-values to match. Can someone please help me and point out how to get these equivalent? I am learning SAS and appreciate your patience and assistance in advance.

References

https://en.wikipedia.org/wiki/Binomial_test

Re: Inquiry: Obtaining equivalent results between R code binom.test and SAS code using PROC FREQ?

Ksharp — Tue, 08 Oct 2019 11:23:19 GMT

If you check ASE , you will see 0.038 which is close to 0.04 .

Test of H0: Proportion = 0.1666666667

ASE under H0 0.0243
Z 2.0713
One-sided Pr > Z 0.0192
Two-sided Pr > |Z| 0.0383

Exact Test
One-sided Pr >= P 0.0265
Two-sided = 2 * One-sided 0.0531

And also try R code: and see the result

binom.test(51, 235, 0.16666666666667, alternative = "two.sided")

Re: Inquiry: Obtaining equivalent results between R code binom.test and SAS code using PROC FREQ?

hyates — Tue, 08 Oct 2019 14:28:56 GMT

Additional Findings:

I believe it has to do with how p-values are being calculated. Consider the following R code:

binom.exact(51, 235, (1/6), tsmethod = "central")

This will give you the same result as SAS with a p-value of 0.05309.

However, if you use

binom.exact(51, 235, 1/6, tsmethod = "minlike")

This will give you the same results as binom.test() and a p-value of 0.04375.

As far as I can tell, minlike: sum of probabilities of outcomes with likelihoods less than or equal to observed. central: is 2 times the minimum of the one-sided p-values bounded by 1.

I have two follow up questions:

1. Statistically, which is the appropriate way to calculate the p-value and why would an algorithm like binom.test have minlike as a default?

2. How can we get PROC FREQ to use minlike calculation of the p-value? I think it is based on Hirji 2006 probability based method.

Reference:

https://cran.r-project.org/web/packages/exactci/exactci.pdf (this discusses the tsmethods as described above)

Re: Inquiry: Obtaining equivalent results between R code binom.test and SAS code using PROC FREQ?

ballardw — Tue, 08 Oct 2019 15:41:46 GMT

@hyates wrote:

1. Statistically, which is the appropriate way to calculate the p-value and why would an algorithm like binom.test have minlike as a default?

That is a question for the R developer.

Re: Inquiry: Obtaining equivalent results between R code binom.test and SAS code using PROC FREQ?

Ksharp — Wed, 09 Oct 2019 11:19:41 GMT

I would follow SAS. Since SAS is standard tool in statistic field .

And the new method maybe better than old one ,but it has to be admitted by most of statistics in the world .

Re: Inquiry: Obtaining equivalent results between R code binom.test and SAS code using PROC FREQ?

FreelanceReinh — Wed, 09 Oct 2019 19:29:03 GMT

Hi @hyates,

@hyates wrote (in the initial post):

What is strange is that the one sided tests p-values are equivalent, but I cannot get the two sided p-values to match.

I think the equivalence of one-sided exact p-values is due to the fact that their definition is uncontroversial:

p = min(cdf('binom',k,p0,n), 1-cdf('binom',k-1,p0,n))

(using SAS notation; cf. section "Binomial Tests," subsection "Equality Test," in the PROC FREQ documentation "Binomial Proportion").

For two-sided exact p-values, however, different definitions have been proposed. I guess the most commonly used method is the symmetrical one used by PROC FREQ ("equal-tailed test").

Many thanks for providing the reference exactci.pdf, which contains a valuable hint as to the method used by R's binom.test function (i.e. "minlike"). Being unfamiliar with R, I had been searching a while for an official documentation of binom.test including methods/formulas (unlike https://www.rdocumentation.org/packages/mosaic/versions/1.5.0/topics/binom.test), until I found at least the source code in https://svn.r-project.org/R/trunk/src/library/stats/R/binom.test.R.

Essentially, this "minlike" method appears to be the first of three alternative methods for computing two-sided exact p-values presented (in addition to the symmetrical approach) in Fleiss et al. (2003), Statistical Methods for Rates and Proportions, 3rd ed., section 2.7: Point Probability Method, Tail Probability Method and Likelihood Ratio Method.

@hyates wrote:

1. Statistically, which is the appropriate way to calculate the p-value and why would an algorithm like binom.test have minlike as a default?

None of the four above-mentioned methods is superior to all others. So, ideally, the choice should be based on statistical and subject-matter considerations. For example, if the Clopper-Pearson confidence interval is reported, it makes sense to use the corresponding equal-tailed test. Fleiss et al. (2003) mention that the alternative methods "will usually improve a bit" "in terms of statistical power and length of confidence intervals." Maybe this was the motivation of the R developer. In some cases there might even be a reason to apportion the type I error rate between the two tails of the distribution neither symmetrically (SAS method) nor data-driven (the three alternatives), so that yet another method would be preferred.

In any case, the method should be specified in advance, i.e., before the data to be analyzed are available. In particular, possible differences in terms of statistical significance (with the data at hand) must not influence the decision.

2. How can we get PROC FREQ to use minlike calculation of the p-value?

I don't think we can get that p-value from PROC FREQ, but we can easily calculate it, e.g., in a DATA step:

%let p0=1/6;
%let relErr=1.0000001;

data want(keep=n k p);
set diceroll end=last;
n+freqq;
if roll='YES' then k+freqq;
if last;
p0=&p0;
pk=pdf('binom',k,p0,n);
do j=0 to n;
  pj=pdf('binom',j,p0,n);
  p+(pj<=pk*&relErr)*pj;
end;
run;

For massive calculations (e.g. in a loop over a large number of parameter combinations) using the CDF function would be more efficient, but requires a bit more code. One could implement the algorithm in a SAS macro or user-defined function (PROC FCMP) in order to make it readily available.

PROC PRINT result for your sample dataset (rounded):

 n      k       p0          p

235    51    0.16667    0.043748

The purpose of the factor relErr (=1+1E-7, taken from the R code) is to avoid errors due to numeric representation issues (rounding errors): It happens that binomial probabilities which are mathematically equal are delivered by the PDF function in slightly different internal binary representations.

Example:

data _null_;
d=pdf('binom',0,1/7,9)-pdf('binom',2,1/7,9);
put d=;
run;

Result (SAS 9.4M5 on Windows 7):

d=2.775558E-16

In fact, both values should equal (6/7)**9, hence d=0. (I was a bit surprised to find even an example where this happened to a number with a finite binary representation: pj=27/64 with n=3, p0=1/4, j=0, 1.)

As a consequence, the p-value for n=9, p0=1/7 and k=2 (i.e. NO 7, YES 2) would be calculated incorrectly as 0.37566... without the "fuzz factor" relErr, whereas the correct result, obtained with the above DATA step, is 0.62539... (note the Boolean expression (pj<=pk*&relErr) in the code!).

It should be mentioned that in certain (rather artificial) cases it can be vice versa, i.e., the fuzz factor causes an incorrect result.

Example: n=20, p0=0.3527785166, k=4 (found in https://stat.ethz.ch/pipermail/r-help/2007-April/129113.html).