Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
NMB82
Obsidian | Level 7

I'm running a Chi Sq test with 2 binary variables. The data are large (>1 million rows) and not balanced (rare event). The test is statistically significant (p < .0001) and the Cramer's V is very small (.006). I took this to mean there is no relationship and the p value is due to such a large sample size/power. However, the odds ratio is 4.4.

 

I'm trying to understand how one effect size (Cramer's V) can be so unlike another (Odds Ratio)? Is the Cramer's V sensitive to data imbalance? Is the OR preferred in this case?

 

OR.jpg

3 REPLIES 3
SteveDenham
Jade | Level 19

Looking through the formulas for calculating Cramer's V or phi or the contingency coefficient, which are all the same in this case, it appears that the small value is determined by the imbalance.  I just have a hard time thinking of these parameters as effect sizes, when they are a measure of agreement - and when the sample is strongly imbalanced towards one row, the values will be small.  Is there some confounding variable, such that these values are influenced by Simpson's paradox?

 

SteveDenham

reneecandy
Calcite | Level 5

I come across the same question. may i ask did you find any evidence, any paper, thesis, journal, or chapters in any book, to support this limitation of Cramer's V?   Eager for your reply! Many thanks!

 

Huang Ling

2024.10.10

 

SteveDenham
Jade | Level 19

Well, I wouldn't call it refereed so much as crowd sourced, but https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V  provides a lot of insight to those of us who aren't familiar with it. The formulas there use k to index the number of columns and r to index the number of rows, so we are not limited to square matrices. Calculating using the summation formulas given doesn't seem hampered by cells with zero counts, so that isn't an issue. The biggest issue with the use of Cramer's V is that it is severely biased toward 1, and unbalanced data increases this.

 

Is that of any help to you?

 

SteveDenham

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2158 views
  • 1 like
  • 3 in conversation