BookmarkSubscribeRSS Feed
zachi_dv
Calcite | Level 5

Could someone please clarify for me the difference between table probability and two-sided PR in a Fischer's exact test?

 

which result should be consider as a p-value indicating the probability that the distribution in the table is random?

 

Zachi

 

9 REPLIES 9
FreelanceReinh
Jade | Level 19

Hello @zachi_dv,

 

The relevant p-value is the "Two-sided Pr <= P", not the "Table Probability (P)".

 

For the explanation let's look at "Example 3.5 Analysis of a 2x2 Contingency Table" from the PROC FREQ documentation.

 

This is the 2x2 contingency table:

Table of Exposure by Response

Exposure          Response(Heart Disease)

Frequency        |No      |Yes     |  Total
-----------------+--------+--------+
Low Cholesterol  |      6 |      2 |      8
Diet             |        |        |
-----------------+--------+--------+
High Cholesterol |      4 |     11 |     15
 Diet            |        |        |
-----------------+--------+--------+
Total                  10       13       23

You are asking about Fisher's exact test results:

       Fisher's Exact Test
----------------------------------
Table Probability (P)       0.0334
Two-sided Pr <= P           0.0393

The null hypothesis is no association between the row variable and the column variable. Assuming this and that the row and column totals (8, 15, 10 and 13 in the example) are fixed, we can compute the (conditional) probabilities of all possible tables with the same marginal frequencies from the hypergeometric distribution. (These are nine tables: The upper left frequency can take the values 0, 1, ..., 8.) The observed table is one of these and it's probability is the "Table Probability (P)" from the output. In our example it is

pdf('hyper',6,23,8,10)=comb(8,6)*comb(15,4)/comb(23,10)=0.033407...

The sum of all table probabilities that are less than or equal to this particular probability is the two-sided p-value of Fisher's exact test. In the example it is the sum of four table probabilities, as shown in the DATA step below:

data fisher;
array pt[0:8];
do k=0 to dim(pt)-1;
  pt[k]=pdf('hyper',k,23,8,10);
  put k= 'pt[' k +(-1) ']=' pt[k];
end;
put / 'Two-sided p-value: pt[6]' @;
pv=pt[6];
do k=0 to dim(pt)-1;
  if k=6 then continue;
  if pt[k]<=pt[6] then do;
    put '+pt[' k +(-1) ']' @;
    pv+pt[k];
  end;
end;
put '=' pv;
run;

Result:

k=0 pt[0]=0.0026248486
k=1 pt[1]=0.0349979809
k=2 pt[2]=0.157490914
k=3 pt[3]=0.314981828
k=4 pt[4]=0.3062323328
k=5 pt[5]=0.1469915197
k=6 pt[6]=0.0334071636
k=7 pt[7]=0.0031816346
k=8 pt[8]=0.0000917779

Two-sided p-value: pt[6]+pt[0]+pt[7]+pt[8]=0.0393054247

 

zachi_dv
Calcite | Level 5

thanks for the quick response. I still don't understand what is the significance of table probability? what does it indicate?

see this example below for instance. Is 0.42 is the probability of getting this table from all possible tables with the same marginal sums?

also, what does P-value of 1 mean? it surely doesn't mean that this is the only possibility of the distribution of the frequencies. 

 

zachi_dv_1-1715785370520.png

 

Fisher's Exact Test
Cell (1,1) Frequency (F)45
Left-sided Pr <= F0.4232
Right-sided Pr >= F1
  
Table Probability (P)0.4232
Two-sided Pr <= P1
FreelanceReinh
Jade | Level 19

Here is the DATA step from my previous post, adapted to your data, together with the results in the log:

753   data fisher;
754   array pt[0:5];
755   do k=0 to dim(pt)-1;
756     pt[k]=pdf('hyper',k,59,5,9);
757     put k= 'pt[' k +(-1) ']=' pt[k];
758   end;
759   put / 'Two-sided p-value: pt[0]' @;
760   pv=pt[0];
761   do k=0 to dim(pt)-1;
762     if k=0 then continue;
763     if pt[k]<=pt[0] then do;
764       put '+pt[' k +(-1) ']' @;
765       pv+pt[k];
766     end;
767   end;
768   put '=' pv;
769   run;

k=0 pt[0]=0.4232114743
k=1 pt[1]=0.4140112249
k=2 pt[2]=0.1409399914
k=3 pt[3]=0.0205537488
k=4 pt[4]=0.0012583928
k=5 pt[5]=0.0000251679

Two-sided p-value: pt[0]+pt[1]+pt[2]+pt[3]+pt[4]+pt[5]=1

So, in your example the observed table is that with the largest (conditional) probability among the six possible tables, given the marginal totals. Hence, the two-sided p-value must be 1 because now, by definition, it is the sum of the (conditional) probabilities of all those six tables. This is the same situation as in the 2020 thread that Rick linked to.

 

The fact that the "Table Probability (P)" pt[0] in your example is only slightly larger than pt[1] indicates that your dataset is close to a situation where pt[1] and not pt[0] is the largest probability (and hence, the two-sided p-value might be considerably smaller). Indeed, by changing the upper left cell frequency from 45 to 43 we get there:

data test;
input Type $ Site $ Count;
cards;
JT1 TM 43
JT1 JS  9
JT2 TM  5
JT2 JS  0
;

proc freq data=test order=data;
tables Type*Site / chisq norow nocol nopercent;
weight Count;
run;

Result:

       Fisher's Exact Test
----------------------------------
Table Probability (P)       0.4089
Two-sided Pr <= P           0.5818

Calculation:

k=0 pt[0]=0.4089468955
k=1 pt[1]=0.4182411432
k=2 pt[2]=0.148707962
k=3 pt[3]=0.0226294725
k=4 pt[4]=0.0014444344
k=5 pt[5]=0.0000300924

Two-sided p-value: pt[0]+pt[2]+pt[3]+pt[4]+pt[5]=0.5817588568
Rick_SAS
SAS Super FREQ

For more information about why this situation uses the PDF of a hypergeometric distribution, see

Models and simulation for 2x2 contingency tables - The DO Loop (sas.com)

Rick_SAS
SAS Super FREQ

The two-sided PR is the p-value. The table entry that says "Table Probability (P)" is the probability of getting your table from among  all the possible tables that have the same marginal row and column totals as the observed table.

 

I think FreelanceReinh made a nice summary of Fisher's exact test at
Interpreting statistical test output with Fisher's exact p-value of 1.... - SAS Support Communities

Also, see https://blogs.sas.com/content/iml/2015/10/28/simulation-exact-tables.html although the example in that article is an exact Chi-Square test, which is a different test.

FreelanceReinh
Jade | Level 19

@Rick_SAS wrote:

Wow, @FreelanceReinh is super fast today! 


Thanks. I knew I had to be fast to be first. 🙂

zachi_dv
Calcite | Level 5
thanks for your adding to FreelanceReinh reply. So now I understand how it is computed, but what is the significance of this number?
FreelanceReinh
Jade | Level 19

@zachi_dv wrote:
So now I understand how it is computed, but what is the significance of this number?

You could say, for example, if the observed table probability is greater than the significance level you had decided upon (e.g., a=0.05), then you don't need to look any further at the p-values (two-sided, left-sided, right-sided) as they are all necessarily greater than or equal to the table probability, i.e., the result of Fisher's exact test is definitely not significant at level a.

 

Technically, the table probability is the test statistic of Fisher's exact test. So it corresponds, e.g., to the chi-square value of Pearson's chi-square test.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 940 views
  • 1 like
  • 3 in conversation