- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm running a Chi Sq test with 2 binary variables. The data are large (>1 million rows) and not balanced (rare event). The test is statistically significant (p < .0001) and the Cramer's V is very small (.006). I took this to mean there is no relationship and the p value is due to such a large sample size/power. However, the odds ratio is 4.4.
I'm trying to understand how one effect size (Cramer's V) can be so unlike another (Odds Ratio)? Is the Cramer's V sensitive to data imbalance? Is the OR preferred in this case?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looking through the formulas for calculating Cramer's V or phi or the contingency coefficient, which are all the same in this case, it appears that the small value is determined by the imbalance. I just have a hard time thinking of these parameters as effect sizes, when they are a measure of agreement - and when the sample is strongly imbalanced towards one row, the values will be small. Is there some confounding variable, such that these values are influenced by Simpson's paradox?
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I come across the same question. may i ask did you find any evidence, any paper, thesis, journal, or chapters in any book, to support this limitation of Cramer's V? Eager for your reply! Many thanks!
Huang Ling
2024.10.10
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, I wouldn't call it refereed so much as crowd sourced, but https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V provides a lot of insight to those of us who aren't familiar with it. The formulas there use k to index the number of columns and r to index the number of rows, so we are not limited to square matrices. Calculating using the summation formulas given doesn't seem hampered by cells with zero counts, so that isn't an issue. The biggest issue with the use of Cramer's V is that it is severely biased toward 1, and unbalanced data increases this.
Is that of any help to you?
SteveDenham