Could someone please clarify for me the difference between table probability and two-sided PR in a Fischer's exact test?
which result should be consider as a p-value indicating the probability that the distribution in the table is random?
Zachi
Hello @zachi_dv,
The relevant p-value is the "Two-sided Pr <= P", not the "Table Probability (P)".
For the explanation let's look at "Example 3.5 Analysis of a 2x2 Contingency Table" from the PROC FREQ documentation.
This is the 2x2 contingency table:
Table of Exposure by Response Exposure Response(Heart Disease) Frequency |No |Yes | Total -----------------+--------+--------+ Low Cholesterol | 6 | 2 | 8 Diet | | | -----------------+--------+--------+ High Cholesterol | 4 | 11 | 15 Diet | | | -----------------+--------+--------+ Total 10 13 23
You are asking about Fisher's exact test results:
Fisher's Exact Test ---------------------------------- Table Probability (P) 0.0334 Two-sided Pr <= P 0.0393
The null hypothesis is no association between the row variable and the column variable. Assuming this and that the row and column totals (8, 15, 10 and 13 in the example) are fixed, we can compute the (conditional) probabilities of all possible tables with the same marginal frequencies from the hypergeometric distribution. (These are nine tables: The upper left frequency can take the values 0, 1, ..., 8.) The observed table is one of these and it's probability is the "Table Probability (P)" from the output. In our example it is
pdf('hyper',6,23,8,10)=comb(8,6)*comb(15,4)/comb(23,10)=0.033407...
The sum of all table probabilities that are less than or equal to this particular probability is the two-sided p-value of Fisher's exact test. In the example it is the sum of four table probabilities, as shown in the DATA step below:
data fisher;
array pt[0:8];
do k=0 to dim(pt)-1;
pt[k]=pdf('hyper',k,23,8,10);
put k= 'pt[' k +(-1) ']=' pt[k];
end;
put / 'Two-sided p-value: pt[6]' @;
pv=pt[6];
do k=0 to dim(pt)-1;
if k=6 then continue;
if pt[k]<=pt[6] then do;
put '+pt[' k +(-1) ']' @;
pv+pt[k];
end;
end;
put '=' pv;
run;
Result:
k=0 pt[0]=0.0026248486 k=1 pt[1]=0.0349979809 k=2 pt[2]=0.157490914 k=3 pt[3]=0.314981828 k=4 pt[4]=0.3062323328 k=5 pt[5]=0.1469915197 k=6 pt[6]=0.0334071636 k=7 pt[7]=0.0031816346 k=8 pt[8]=0.0000917779 Two-sided p-value: pt[6]+pt[0]+pt[7]+pt[8]=0.0393054247
thanks for the quick response. I still don't understand what is the significance of table probability? what does it indicate?
see this example below for instance. Is 0.42 is the probability of getting this table from all possible tables with the same marginal sums?
also, what does P-value of 1 mean? it surely doesn't mean that this is the only possibility of the distribution of the frequencies.
Fisher's Exact Test | |
Cell (1,1) Frequency (F) | 45 |
Left-sided Pr <= F | 0.4232 |
Right-sided Pr >= F | 1 |
Table Probability (P) | 0.4232 |
Two-sided Pr <= P | 1 |
Here is the DATA step from my previous post, adapted to your data, together with the results in the log:
753 data fisher; 754 array pt[0:5]; 755 do k=0 to dim(pt)-1; 756 pt[k]=pdf('hyper',k,59,5,9); 757 put k= 'pt[' k +(-1) ']=' pt[k]; 758 end; 759 put / 'Two-sided p-value: pt[0]' @; 760 pv=pt[0]; 761 do k=0 to dim(pt)-1; 762 if k=0 then continue; 763 if pt[k]<=pt[0] then do; 764 put '+pt[' k +(-1) ']' @; 765 pv+pt[k]; 766 end; 767 end; 768 put '=' pv; 769 run; k=0 pt[0]=0.4232114743 k=1 pt[1]=0.4140112249 k=2 pt[2]=0.1409399914 k=3 pt[3]=0.0205537488 k=4 pt[4]=0.0012583928 k=5 pt[5]=0.0000251679 Two-sided p-value: pt[0]+pt[1]+pt[2]+pt[3]+pt[4]+pt[5]=1
So, in your example the observed table is that with the largest (conditional) probability among the six possible tables, given the marginal totals. Hence, the two-sided p-value must be 1 because now, by definition, it is the sum of the (conditional) probabilities of all those six tables. This is the same situation as in the 2020 thread that Rick linked to.
The fact that the "Table Probability (P)" pt[0] in your example is only slightly larger than pt[1] indicates that your dataset is close to a situation where pt[1] and not pt[0] is the largest probability (and hence, the two-sided p-value might be considerably smaller). Indeed, by changing the upper left cell frequency from 45 to 43 we get there:
data test;
input Type $ Site $ Count;
cards;
JT1 TM 43
JT1 JS 9
JT2 TM 5
JT2 JS 0
;
proc freq data=test order=data;
tables Type*Site / chisq norow nocol nopercent;
weight Count;
run;
Result:
Fisher's Exact Test ---------------------------------- Table Probability (P) 0.4089 Two-sided Pr <= P 0.5818
Calculation:
k=0 pt[0]=0.4089468955 k=1 pt[1]=0.4182411432 k=2 pt[2]=0.148707962 k=3 pt[3]=0.0226294725 k=4 pt[4]=0.0014444344 k=5 pt[5]=0.0000300924 Two-sided p-value: pt[0]+pt[2]+pt[3]+pt[4]+pt[5]=0.5817588568
For more information about why this situation uses the PDF of a hypergeometric distribution, see
Models and simulation for 2x2 contingency tables - The DO Loop (sas.com)
The two-sided PR is the p-value. The table entry that says "Table Probability (P)" is the probability of getting your table from among all the possible tables that have the same marginal row and column totals as the observed table.
I think FreelanceReinh made a nice summary of Fisher's exact test at
Interpreting statistical test output with Fisher's exact p-value of 1.... - SAS Support Communities
Also, see https://blogs.sas.com/content/iml/2015/10/28/simulation-exact-tables.html although the example in that article is an exact Chi-Square test, which is a different test.
Wow, @FreelanceReinh is super fast today!
@Rick_SAS wrote:
Wow, @FreelanceReinh is super fast today!
Thanks. I knew I had to be fast to be first. 🙂
@zachi_dv wrote:
So now I understand how it is computed, but what is the significance of this number?
You could say, for example, if the observed table probability is greater than the significance level you had decided upon (e.g., a=0.05), then you don't need to look any further at the p-values (two-sided, left-sided, right-sided) as they are all necessarily greater than or equal to the table probability, i.e., the result of Fisher's exact test is definitely not significant at level a.
Technically, the table probability is the test statistic of Fisher's exact test. So it corresponds, e.g., to the chi-square value of Pearson's chi-square test.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.