Hi:
Apologies if this is a terribly elementary question but I have run into a genuine 'first' for me and am hoping to get some feedback on correctly interpreting the test statistics.
I have a bunch of categorical data and am running simple 2X2 contingency tables.
Some of the cell sizes are very small (including values of 0) and therefore I chose the Fisher's exact option on my PROC FREQ.
The null hypothesis is that there is no difference in the outcome (var2; Yes=1; No=2) for persons who have the exposure of interest (var1=1) compared to those who do not (var1=2) .
To assess the hypothesis I am using the 2-sided probability in the Fisher's Exact stats table.
In the example below, my overall sample size is 155. However, I should note that in another data set of overall size = 729, I am running similar 2X2 contingency tables and some of the Fisher's exact 2-sided p-values also equal 1 when some of the cell sizes are less than 5.
The 'first' for me is that for some of my 2X2 tables, the 2-sided probability is equal to 1.00.
I am unsure if this should raise a red flag or not. My hesitation is that with traditional Chi-Square tests of contingency, I understand that the probability values range from 0 to 1 so the fact that under the exact test (which I believe is appropriate given the small cell sizes in this example) I'm seeing a probability of exactly 1 throws me in terms of how to interpret this correctly.
Am I missing something obvious?
My SAS code and output is below.
Thanks so much.
proc freq data=analysis;
tables var1 *var2 /CHISQ EXACT;
run;
OUTPUT
Table of var1 var2 | Statistics for Table of var1 by var2 | Fisher's Exact Test | |||||||||
var2 | Statistic | DF | Value | Prob | Cell (1,1) Frequency (F) | 93 | |||||
var1 | Yes (1) | No (2) | Total | Chi-Square | 1 | 0.2973 | 0.5856 | Left-sided Pr <= F | 0.5081 | ||
No (0) | 93 | 58 | 151 | Likelihood Ratio Chi-Square | 1 | 0.3152 | 0.5745 | Right-sided Pr >= F | 0.8564 | ||
Yes (1) | 3 | 1 | 4 | Continuity Adj. Chi-Square | 1 | 0.0006 | 0.9812 | ||||
Total | 96 | 59 | 155 | Mantel-Haenszel Chi-Square | 1 | 0.2953 | 0.5868 | Table Probability (P) | 0.3645 | ||
Phi Coefficient | -0.044 | Two-sided Pr <= P | 1 | ||||||||
Contingency Coefficient | 0.0438 | Sample Size = 155 | |||||||||
Cramer's V | -0.044 | ||||||||||
WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test. |
Hi @closetcoer,
I don't think that these results should "raise a red flag." The p-value 1 in these cases just means that the observed table has the largest probability among all tables with the same marginal totals. In your example there are only five possible tables (due to the smallest marginal total being 4): Just let the cell var1=var2=1 take the frequencies 0, 1, 2, 3 and 4 (and determine the other cell frequencies from that). The corresponding table probabilities are 0.0197, 0.1349, 0.3373, 0.3645 (your table, see "Table Probability" in the output) and 0.1436, respectively. So, the (conditional) probability of getting a table with the observed probability (0.3645) or a smaller probability is indeed 0.0197+0.1349+0.3373+0.3645+0.1436=1. Obviously, with only five possible outcomes it is not really surprising to observe the most probable one among these five.
So, in this case the observed cell frequencies are "as close to the null hypothesis as they can be," given the marginal totals. In particular, the p-value 1 being greater than your significance level means (as usual) that you cannot reject the null hypothesis.
Thanks SO much! Not being a biostatistician I was loathe to conclude that I didn't somehow miss the obvious but your explanation is very clear.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.