BookmarkSubscribeRSS Feed
NMB82
Obsidian | Level 7

I'm running a Chi Sq test with 2 binary variables. The data are large (>1 million rows) and not balanced (rare event). The test is statistically significant (p < .0001) and the Cramer's V is very small (.006). I took this to mean there is no relationship and the p value is due to such a large sample size/power. However, the odds ratio is 4.4.

 

I'm trying to understand how one effect size (Cramer's V) can be so unlike another (Odds Ratio)? Is the Cramer's V sensitive to data imbalance? Is the OR preferred in this case?

 

OR.jpg

3 REPLIES 3
SteveDenham
Jade | Level 19

Looking through the formulas for calculating Cramer's V or phi or the contingency coefficient, which are all the same in this case, it appears that the small value is determined by the imbalance.  I just have a hard time thinking of these parameters as effect sizes, when they are a measure of agreement - and when the sample is strongly imbalanced towards one row, the values will be small.  Is there some confounding variable, such that these values are influenced by Simpson's paradox?

 

SteveDenham

reneecandy
Calcite | Level 5

I come across the same question. may i ask did you find any evidence, any paper, thesis, journal, or chapters in any book, to support this limitation of Cramer's V?   Eager for your reply! Many thanks!

 

Huang Ling

2024.10.10

 

SteveDenham
Jade | Level 19

Well, I wouldn't call it refereed so much as crowd sourced, but https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V  provides a lot of insight to those of us who aren't familiar with it. The formulas there use k to index the number of columns and r to index the number of rows, so we are not limited to square matrices. Calculating using the summation formulas given doesn't seem hampered by cells with zero counts, so that isn't an issue. The biggest issue with the use of Cramer's V is that it is severely biased toward 1, and unbalanced data increases this.

 

Is that of any help to you?

 

SteveDenham

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2602 views
  • 1 like
  • 3 in conversation