I'm running a Chi Sq test with 2 binary variables. The data are large (>1 million rows) and not balanced (rare event). The test is statistically significant (p < .0001) and the Cramer's V is very small (.006). I took this to mean there is no relationship and the p value is due to such a large sample size/power. However, the odds ratio is 4.4.
I'm trying to understand how one effect size (Cramer's V) can be so unlike another (Odds Ratio)? Is the Cramer's V sensitive to data imbalance? Is the OR preferred in this case?
Looking through the formulas for calculating Cramer's V or phi or the contingency coefficient, which are all the same in this case, it appears that the small value is determined by the imbalance. I just have a hard time thinking of these parameters as effect sizes, when they are a measure of agreement - and when the sample is strongly imbalanced towards one row, the values will be small. Is there some confounding variable, such that these values are influenced by Simpson's paradox?
SteveDenham
I come across the same question. may i ask did you find any evidence, any paper, thesis, journal, or chapters in any book, to support this limitation of Cramer's V? Eager for your reply! Many thanks!
Huang Ling
2024.10.10
Well, I wouldn't call it refereed so much as crowd sourced, but https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V provides a lot of insight to those of us who aren't familiar with it. The formulas there use k to index the number of columns and r to index the number of rows, so we are not limited to square matrices. Calculating using the summation formulas given doesn't seem hampered by cells with zero counts, so that isn't an issue. The biggest issue with the use of Cramer's V is that it is severely biased toward 1, and unbalanced data increases this.
Is that of any help to you?
SteveDenham
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.